Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们 横山亮次奖 百年刊庆
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  横山亮次奖  |  百年刊庆
清华大学学报(自然科学版)  2017, Vol. 57 Issue (10): 1038-1041,1047    DOI: 10.16511/j.cnki.qhdxxb.2017.25.042
  计算机科学与技术 本期目录 | 过刊浏览 | 高级检索 |
应用于语种识别的加权音素对数似然比特征
张健1, 徐杰2, 包秀国2, 周若华1, 颜永红1
1. 中国科学院 声学研究所, 北京 100190;
2. 国家计算机网络应急技术处理协调中心, 北京 100029
Weighted phone log-likelihood ratio feature for spoken language recognition
ZHANG Jian1, XU Jie2, BAO Xiuguo2, ZHOU Ruohua1, YAN Yonghong1
1. Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China;
2. National Computer Network Emergency Response Technical Team Coordination Center of China, Beijing 100029, China
全文: PDF(1050 KB)  
输出: BibTeX | EndNote (RIS)      
摘要 语种识别的关键问题之一是提取语音信号中的语种鉴别性信息。近期,音素对数似然比(phone log-likelihood ratio,PLLR)的新特征被引入语种识别领域,并表现出了优异的性能。该文利用F比方法分析了PLLR特征向量各维的语种鉴别性大小,提出了加权音素对数似然比(weighted PLLR,WPLLR)特征,赋予PLLR特征中含有较多语种鉴别性信息的分量较高的权重。在美国国家标准技术署(National Institute of Standards and Technology,NIST)2007年语种识别测试集上的实验结果表明:相比于原PLLR特征,该文所提出的WPLLR特征在平均检测代价和等错率2个指标上都显著降低。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
张健
徐杰
包秀国
周若华
颜永红
关键词 语音信号处理语种识别语种鉴别性加权音素对数似然比(WPLLR)F    
Abstract:The extraction of linguistic discriminative features is one of the fundamental issues in spoken language recognition (SLR). The frame level phone log-likelihood ratio (PLLR) has been recently introduced to improve language recognition. In this paper, the F-ratio analysis method is used to analyze the contributions of different SLR feature vector dimensions. Then, a weighted phone log-likelihood ratio (WPLLR) feature is used to more heavily weight those dimensions with high F-ratio values. Tests on the National Institute of Standards and Technology (NIST) 2007 dataset for SLR show the effectiveness of this feature, with significant relative improvements in the average cost performance and equal error rate compared with the PLLR feature.
Key wordsspeech signal processing    spoken language recognition    linguistic discrimination    weighted phone log-likelihood ratio (WPLLR)    F-ratio
收稿日期: 2016-06-22      出版日期: 2017-10-15
ZTFLH:  TN912.3  
通讯作者: 周若华,研究员,E-mail:zhouruohua@hccl.ioa.ac.cn     E-mail: zhouruohua@hccl.ioa.ac.cn
引用本文:   
张健, 徐杰, 包秀国, 周若华, 颜永红. 应用于语种识别的加权音素对数似然比特征[J]. 清华大学学报(自然科学版), 2017, 57(10): 1038-1041,1047.
ZHANG Jian, XU Jie, BAO Xiuguo, ZHOU Ruohua, YAN Yonghong. Weighted phone log-likelihood ratio feature for spoken language recognition. Journal of Tsinghua University(Science and Technology), 2017, 57(10): 1038-1041,1047.
链接本文:  
http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2017.25.042  或          http://jst.tsinghuajournals.com/CN/Y2017/V57/I10/1038
  图1 WPLLR 特征提取流程图
  图2 3个识别器分别对应的F 比值
  表1 EER 结果对比
  表2 Cavg结果对比
[1] Li H, Ma B, Lee K. Spoken language recognition:From fundamentals to practice[J]. Proceedings of the IEEE, 2013, 101(5):1136-1159.
[2] Torres-Carrasquillo P, Singer E, Kohler M, et al. Approaches to language identification using Gaussian mixture models and shifted delta cepstral features[C]//7th International Conference on Spoken Language Processing. Denver, CO, USA:IEEE, 2002:89-92.
[3] Reynolds D A, Quatieri T F, Dunn R B. Speaker verification using adapted Gaussian mixture models[J]. Digital Signal Process, 2000, 10(1-3):19-41.
[4] Dehak N, Torres-Carrasquillo P A, Reynolds D A, et al. Language recognition via i-vectors and dimensionality reduction[C]//12th Annual Conference of the International Speech Communication Association. Florence, Italy, 2011:857-860.
[5] Campbell W M, Sturim D E, Reynolds D A. Support vector machines using GMM supervectors for speaker verification[J]. IEEE Signal Process Letters, 2006, 13(5):308-311.
[6] Yan Y, Barnard E. An approach to automatic language identification based on language-dependent phone recognition[C]//IEEE International Conference on Acoustics, Speech, and Signal Processing. Detroit, MI, USA:IEEE, 1995:3511-3514.
[7] Li H, Ma B, Lee C. A vector space modeling approach to spoken language identification[J].IEEE Transactions on Audio, Speech, and Language Processing, 2007,15(1):271-284.
[8] Diez M, Varona A, Penagarikano M, et al. On the use of phone log-likelihood ratios as features in spoken language recognition[C]//2012 IEEE Spoken Language Technology Workshop (SLT). Miami, FL, USA:IEEE, 2012:274-279.
[9] LU Xugang, DANG Jianwu. An investigation of dependencies between frequency components and speaker characteristics for text-independent speaker identification[J].Speech Communication, 2008,50(4):312-322.
[10] Martin A F, Le A N. NIST 2007 language recognition evaluation[C]//Odyssey 2008:The Speaker and Language Recognition Workshop. Stellenbosch, South Africa:IEEE, 2008:16.
[11] Matejka P, Schwarz P, Cernocký J, et al. Phonotactic language identification using high quality phoneme recognition[C]//9th European Conference on Speech Communication and Technology. Lisbon, Portugal, 2005:2237-2240.
[12] Diez M, Varona A, Penagarikano M, et al. Dimensionality reduction of phone log-likelihood ratio features for spoken language recognition[C]//Conference of the InternationalSpeech Communication Association. Lyon,France, 2013:64-68.
[13] D'Haro L F, Cordoba R, Salamea C, et al. Extended phone log-likelihood ratio features and acoustic-basedi-vectors for language recognition[C]. International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada, 2014:5342-5346.
[14] 王宪亮, 吴志刚, 杨金超, 等. 基于SVM一对一分类的语种识别方法[J]. 清华大学学报(自然科学版), 2013,53(6):808-812. WANG Xianliang, WU Zhigang, YANG Jinchao, et al. Language recognition based on SVM 1 vs. 1 classification[J].J Tsinghua Univ (Sci & Tech), 2013,53(6):808-812. (in Chinese)
[1] 苗晓晓, 张健, 索宏彬, 周若华, 颜永红. 应用于短时语音语种识别的时长扩展方法[J]. 清华大学学报(自然科学版), 2018, 58(3): 254-259.
[2] 陈萧, 徐波. 改进的用于口语处理的基频提取算法[J]. 清华大学学报(自然科学版), 2017, 57(1): 95-99.
[3] 畅江, 张雪英, 张奇萍, 陈宏涛, 孙颖, 胡凤云. 不同语种及非言语情感声音的ERP研究[J]. 清华大学学报(自然科学版), 2016, 56(10): 1131-1136.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 《清华大学学报(自然科学版)》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn