语速对三合元音共振峰动态特征的影响

曹洪林; 王宇靖; 李敬阳

doi:10.16511/j.cnki.qhdxxb.2017.26.047

清华大学学报（自然科学版） >

2017 , Vol. 57 >Issue 9: 958 - 962

DOI: https://doi.org/10.16511/j.cnki.qhdxxb.2017.26.047

电子工程

语速对三合元音共振峰动态特征的影响

曹洪林 ,
王宇靖 ,
李敬阳

展开

1. 司法文明协同创新中心, 北京 100088;
2. 证据科学教育部重点实验室(中国政法大学), 北京 100088;
3. 智能语音技术公安部重点实验室, 北京 100038;
4. 北京市朝阳区监察委员会, 北京 100026;
5. 公安部物证鉴定中心, 北京 100038

收稿日期: 2016-05-12

网络出版日期: 2017-09-15

收起

Effect of speaking rate on the formant dynamics of triphthongs

CAO Honglin ,
WANG Yujing ,
LI Jingyang

Expand

1. Collaborative Innovation Center of Judicial Civilization, Beijing 100088, China;
2. Key Laboratory of Evidence Science China University of Political Science and Law, Ministry of Education, Beijing 100088, China;
3. Key Laboratory of Intelligent Speech Technology, Ministry of Public Security, Beijing 100038, China;
4. Control Commission of Chaoyang District, Beijing 100026, China;
5. Ministry of Public Security Evidence Identification Center, Beijing 100038, China

Received date: 2016-05-12

Online published: 2017-09-15

Fold

摘要

该文以30位18至28岁的男性被试为对象，在快速、中速和慢速3种语速条件下，对汉语普通话中4个三合元音（/iau/、/iou/、/uai/、/uei/）共振峰的动态特征进行了量化分析。运用三次多项式拟合方法描述前3条共振峰的动态轨迹，以拟合系数为自变量，对共振峰的动态特征进行了判别分析。结果表明：相同语速语音比对时，语速不同，判别能力也不同，快速发音的判别能力最高（平均为76.7%），中速和慢速发音的判别能力相对较低（分别为69.5%、70.3%）。不同语速语音组合比对时，各三合元音的判别能力均有所下降，其中"快+慢"组合的判别效果最差（平均为48.0%）。所有的语速条件下，判别能力最高的三合元音均为/iau/。由此可知，语速相同或相近时，三合元音的共振峰动态特征可以有效区分不同说话人。

关键词： 三合元音; 共振峰动态特征; 多项式拟合; 判别分析; 语速

本文引用格式

曹洪林 , 王宇靖 , 李敬阳 . 语速对三合元音共振峰动态特征的影响[J]. 清华大学学报（自然科学版）, 2017 , 57(9) : 958 -962 . DOI: 10.16511/j.cnki.qhdxxb.2017.26.047

Abstract

This study investigates the individual differences in the formant dynamics for four Chinese triphthongs/iau/,/iou/,/uai/and/uei/produced by thirty male subjects aged 18 to 28 years old at three different speaking rates (fast/normal/slow). The formant dynamics are described by cubic polynomial fits. The objective is to be able to discriminate between different speakers. The results show that the discriminant abilities vary for different speaking rates. Specifically, the fast speech has the best discrimination (76.7%), followed by the normal (69.5%) and slow (70.3%) speech. The triphthong discrimination ability decreases when the speaking speeds are differ, with "fast + slow" speeds giving the worst discrimination (48.0%). In all cases,/iau/more easily identifies different speakers than the other three triphthongs. Therefore, the formant dynamics of triphthongs with the same or similar speaking rates can be used to more easily distinguish different speakers.

Key words： triphthong; formant dynamics; polynomial fitting; discriminant analysis; speaking rate

参考文献

[1] 王英利. 论声纹鉴定中复合韵母和鼻韵母中音素间连接形态特征[J]. 警察技术, 2001, 5:25-27.WANG Yingli. Connection morphological characteristic between phonemes of compound vowels and nasal terminal vowels in forensic phonesic[J]. Police Technology, 2001, 5:25-27. (in Chinese) [2] 曹洪林, 孔江平. 长时共振峰分布特征在声纹鉴定中的应用[J]. 中国司法鉴定, 2013, 66(1):62-67.CAO Honglin, KONG Jiangping. Forensic speaker comparison by using long-term formant distribution[J]. Chinese Journal of Forensic Sciences, 2013, 66(1):62-67. (in Chinese) [3] McDougall K. Speaker characterising properties of formant dynamics:a case study[C]//Proc of 9th Australasian International Conference on SST. Melbourne, Australia, 2002:403-408. [4] McDougall K. Speaker-specific formant dynamics:An experiment on Australian English/a?/[J]. Int J Speech Lang La, 2004, 11(1):103-130. [5] McDougall K. Dynamic features of speech and the characterization of speakers:Towards a new approach using formant frequencies[J]. Int J Speech Lang La, 2006, 13(1):89-126. [6] Goldstein U G. Speaker identifying features based on formant tracks[J]. J Acoust Soc Am, 1976, 59(1):176-182. [7] Clermont F. Speaker variance ratios in forensically realisatic vowel formant data:Normalising for consonantal context[C]//Proc of 20th IAFPA. Vienna, Austria, 2011. [8] Ingram J C L, Prandolini R, Ong S. Formant trajectories as indices of phonetic variation for speaker identification[J]. Int J Speech Lang La, 1996, 3(1):129-145. [9] Greisbach R, Esser O, Weinstock C. Speaker identification by formant contours[J]. Beiträge Zur Phonetik Und Linguistik, 1995, 64:49-55. [10] Morrison G S. Likelihood-ratio forensic voice comparison using parametric representations of the formant trajectories of diphthongs[J]. J Acoust Soc Am, 2009, 125(4):2387-2397. [11] Zhang C, Morrison G S, Thiruvaran T. Forensic voice comparison using Chinese/iau/[C]//Proc of 17th ICPhS. Hong Kong, China, 2011:2280-2283. [12] McDougall K, Nolan F. Discrimination of speakers using the formant dynamics of/u:/in British English[C]//Proc of 16th ICPhS. Saarbrücken, German, 2007:1825-1828. [13] Enzinger E. Characterizing formant tracks in Viennese diphthongs for forensic speaker comparison[C]//Proc of 39th AES Conferences. Santander, Spain, 2010:47-52. [14] Taitechawat S, Foulkes P. Discrimination of speakers using tone and formant dynamics in Thai[C]//Proc of 17th ICPhS. Hong Kong, China, 2011:1975-1981. [15] Zuo D, Mok P P K. Formant dynamics of/ua/in the speech of Mandarin-Shanghainese bilingual identical twins[C]//Proc of 17th ICPhS. Hong Kong, China, 2011:2332-2335. [16] 李敬阳, 王莉, 崔杰, 等. 说话人汉语普通话二合元音共振峰动态特征分析[C]//公安部物证鉴定中心.第一届全国声像资料检验鉴定技术交流会论文选.北京:中国人民公安大学出版社, 2011:612-615.LI Jingyang, WANG Li, CUI Jie, et al. Formant dynamic features in Chinese diphthong[C]//The Ministry of Public Security Material Evidence Identification Center. The 1st National Audio-visual Materials Appraisal Technical Forums. Beijing:People's Public Security University of China Press, 2011:612-615. (in Chinese) [17] Pitermann M. Effect of speaking rate and contrastive stress on formant dynamics and vowel perception[J]. J Acoust Soc Am, 2000, 107(6):3425-3437. [18] Fejlová D, Lukeš D, Skarnitzl R. Formant contours in Czech vowels:Speaker discriminating potential[C]//Proc of Interspeech. Lyon,France, 2013:3182-3186. [19] Wood S, Hughes H, Foulkes P. Filled pauses as variables in speaker comparison:Dynamic formant analysis and duration measurements improve performance for um[C]//Proc of 23th IAFPA. Zürich, Switzerland, 2014:81-82. [20] Skarnitzl R, Vaňková J, Weingartová L. Speaker discrimination using short-and long-term segmental information in vowels[C]//Proc of 21th IAFPA. Santander, Spain, 2012:3-4. [21] Zuo D, Mok P P K. Formant dynamics of bilingual identical twins in non-contemporaneous speech[C]//Proc of 14th Australasian International Conference on SST. Sydney, Australia, 2012:89-92. [22] 王英利. 声纹鉴定技术[M]. 北京:群众出版社, 2013.WANG Yingli. Forensic Phonetics[M]. Beijing:Masses Press, 2013.(in Chinese) [23] Sj lander K, Beskow J. Wavesurfer-an open source speech tool[C]//Proc of 6th ICSLP. Beijing, China, 2000:464-467.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献

访问统计