电子工程

矢量半径驱动的汉语普通话立体声道模型

  • 姚云 ,
  • 吴西愉 ,
  • 孔江平
展开
  • 1. 河南大学 文学院, 开封 475001;
    2. 北京大学 中国语言文学系, 北京 100871

收稿日期: 2016-05-06

  网络出版日期: 2017-09-15

Radius vector-driven 3-D Mandarin vocal tract model

  • YAO Yun ,
  • WU Xiyu ,
  • KONG Jiangping
Expand
  • 1. College of Chinese Language and Literature, Henan University, Kaifeng 475001, China;
    2. Department of Chinese Language and Literature, Peking University, Beijing 100871, China

Received date: 2016-05-06

  Online published: 2017-09-15

摘要

为得到更加精确的声道共鸣特性,人们越来越关心说话时声道的结构及其形状变化。该文从MRI 3-D图像数据中提取了汉语普通话7个单元音[a]、[o]、[r]、[i]、[u]、[y]、[e]的声道边缘、中线和矢量半径数据,并沿着声道中线从唇到声门等间距截取声道的36个截面,对于每个截面,根据矢量半径的大小使同一个位置的截面形状作线性过渡,从而建立用矢量半径驱动的汉语普通话立体声道模型。计算模型的共振峰并合成语音样本,在与自然语音对比的听辨实验中,该模型取得了较好的语音合成效果。

本文引用格式

姚云 , 吴西愉 , 孔江平 . 矢量半径驱动的汉语普通话立体声道模型[J]. 清华大学学报(自然科学版), 2017 , 57(9) : 945 -951 . DOI: 10.16511/j.cnki.qhdxxb.2017.26.045

Abstract

Analyses of the vocal tract resonant characteristics need accurate models of the vocal tract shape. This article presents a three-dimensional Mandarin vocal tract model using vocal tract shape data and midsagittal radius vector data from MRI images for seven Mandarin sustained vowels[a],[o],[r],[i],[u],[y] and[e]. The vocal tracts images were cut into 36 sections of equal distances along the midline of the vocal tract. The Mandarin vocal tract model for each section is then driven by the length of the radius vector in the cross-sectional images. The sound synthesized by this model sounds very much like natural speech.

参考文献

[1] 马大猷. 说话的科学技术[M]. 北京:清华大学出版社, 2004.MA Dayou. Talking Science and Technology[M]. Beijing:Tsinghua University Press, 2004. (in Chinese) [2] Stevens K N, House A S. Development of a quantitative description of vowel articulation[J]. Journal of the Acoustical Society of America, 1955, 27:484-493. [3] Fant G. The Acoustic Theory of Speech Production[M]. Hague:Mouton, 1960. [4] Heinz J M, Stevens K N. On the derivation of area functions and acoustic spectra from cineradiographic films of speech[J]. Journal of the Acoustical Society of America, 1964, 36:1037. [5] Sundberg J. On the problem of obtaining area functions from lateral X-ray pictures of the vocal tract[J]. Royal Inst Technol STL-QPSR, 1969, 1:43-45. [6] Chiba T, Kajiyama M. The Vowel:Its Nature and Structure[M]. Tokyo:Kaiseikan Publishing Company, 1942. [7] Baer T, Gore J C, Gracco L C, et al. Analysis of vocal tract shape and dimensions using magnetic resonance imaging:vowels[J]. Journal of the Acoustical Society of America, 1991, 90(2):799-828. [8] Story B H, Hoffman E A, Titze I R. Vocal tract imaging:A comparison of MRI and EBCT[J]. Medical Imaging Physiology and Function from Multidimensional Images, Hoffman, 1996, 2709:209-222. [9] Narayanan S S, Alwan A A, Haker K. Toward articulatory-acoustic models for liquid approximants based on MRI and EPG data. Part I. The laterals[J]. Journal of the Acoustical Society of America, 1997, 101(2):1064-1077. [10] Alwan A, Narayanan S, Haker K. Toward articulatory-acoustic models for liquid approximants based on MRI and EPG data. Part Ⅱ. The rhotics[J]. Journal of the Acoustical Society of America, 1997, 101(2):1078-1089. [11] Espy-Wilson C Y, Boyce S E, Jackson M, et al. Acoustic modeling of American English vertical bar r vertical bar[J]. Journal of the Acoustical Society of America, 2000, 108(1):343-356. [12] Story B H, Titze I R. Parameterization of vocal tract area functions by empirical orthogonal modes[J]. Journal of Phonetics, 1998, 26(3):223-260. [13] Story B H. A parametric model of the vocal tract area function for vowel and consonant simulation[J]. Journal of the Acoustical Society of America, 2005, 117(5):3231-3254. [14] Dang J W, Honda K. Construction and control of a physiological articulatory model[J]. Journal of the Acoustical Society of America, 2004, 115(2):853-870. [15] Dang J W, Honda K. Estimation of vocal tract shapes from speech sounds with a physiological articulatory model[J]. Journal of Phonetics, 2002, 30(3):511-532. [16] Dang J W, Honda K, Suzuki H. Morphological and acoustical analysis of the nasal and the paranasal cavities[J]. Journal of the Acoustical Society of America, 1994, 96(4):2088-2100. [17] Dang J W, Honda K. Acoustic characteristics of the piriform fossa in models and humans[J]. Journal of the Acoustical Society of America, 1997, 101(1):456-465. [18] Dang J W, Shadle C H, Kawanishi Y, et al. An experimental study of the open end correction coefficient for side branches within an acoustic tube[J]. Journal of the Acoustical Society of America, 1998, 104(2):1075-1084.
文章导航

/