计算机科学与技术

应用于语种识别的加权音素对数似然比特征

  • 张健 ,
  • 徐杰 ,
  • 包秀国 ,
  • 周若华 ,
  • 颜永红
展开
  • 1. 中国科学院 声学研究所, 北京 100190;
    2. 国家计算机网络应急技术处理协调中心, 北京 100029

收稿日期: 2016-06-22

  网络出版日期: 2017-10-15

Weighted phone log-likelihood ratio feature for spoken language recognition

  • ZHANG Jian ,
  • XU Jie ,
  • BAO Xiuguo ,
  • ZHOU Ruohua ,
  • YAN Yonghong
Expand
  • 1. Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China;
    2. National Computer Network Emergency Response Technical Team Coordination Center of China, Beijing 100029, China

Received date: 2016-06-22

  Online published: 2017-10-15

摘要

语种识别的关键问题之一是提取语音信号中的语种鉴别性信息。近期,音素对数似然比(phone log-likelihood ratio,PLLR)的新特征被引入语种识别领域,并表现出了优异的性能。该文利用F比方法分析了PLLR特征向量各维的语种鉴别性大小,提出了加权音素对数似然比(weighted PLLR,WPLLR)特征,赋予PLLR特征中含有较多语种鉴别性信息的分量较高的权重。在美国国家标准技术署(National Institute of Standards and Technology,NIST)2007年语种识别测试集上的实验结果表明:相比于原PLLR特征,该文所提出的WPLLR特征在平均检测代价和等错率2个指标上都显著降低。

本文引用格式

张健 , 徐杰 , 包秀国 , 周若华 , 颜永红 . 应用于语种识别的加权音素对数似然比特征[J]. 清华大学学报(自然科学版), 2017 , 57(10) : 1038 -1041,1047 . DOI: 10.16511/j.cnki.qhdxxb.2017.25.042

Abstract

The extraction of linguistic discriminative features is one of the fundamental issues in spoken language recognition (SLR). The frame level phone log-likelihood ratio (PLLR) has been recently introduced to improve language recognition. In this paper, the F-ratio analysis method is used to analyze the contributions of different SLR feature vector dimensions. Then, a weighted phone log-likelihood ratio (WPLLR) feature is used to more heavily weight those dimensions with high F-ratio values. Tests on the National Institute of Standards and Technology (NIST) 2007 dataset for SLR show the effectiveness of this feature, with significant relative improvements in the average cost performance and equal error rate compared with the PLLR feature.

参考文献

[1] Li H, Ma B, Lee K. Spoken language recognition:From fundamentals to practice[J]. Proceedings of the IEEE, 2013, 101(5):1136-1159.[2] Torres-Carrasquillo P, Singer E, Kohler M, et al. Approaches to language identification using Gaussian mixture models and shifted delta cepstral features[C]//7th International Conference on Spoken Language Processing. Denver, CO, USA:IEEE, 2002:89-92.[3] Reynolds D A, Quatieri T F, Dunn R B. Speaker verification using adapted Gaussian mixture models[J]. Digital Signal Process, 2000, 10(1-3):19-41.[4] Dehak N, Torres-Carrasquillo P A, Reynolds D A, et al. Language recognition via i-vectors and dimensionality reduction[C]//12th Annual Conference of the International Speech Communication Association. Florence, Italy, 2011:857-860.[5] Campbell W M, Sturim D E, Reynolds D A. Support vector machines using GMM supervectors for speaker verification[J]. IEEE Signal Process Letters, 2006, 13(5):308-311.[6] Yan Y, Barnard E. An approach to automatic language identification based on language-dependent phone recognition[C]//IEEE International Conference on Acoustics, Speech, and Signal Processing. Detroit, MI, USA:IEEE, 1995:3511-3514.[7] Li H, Ma B, Lee C. A vector space modeling approach to spoken language identification[J].IEEE Transactions on Audio, Speech, and Language Processing, 2007,15(1):271-284.[8] Diez M, Varona A, Penagarikano M, et al. On the use of phone log-likelihood ratios as features in spoken language recognition[C]//2012 IEEE Spoken Language Technology Workshop (SLT). Miami, FL, USA:IEEE, 2012:274-279.[9] LU Xugang, DANG Jianwu. An investigation of dependencies between frequency components and speaker characteristics for text-independent speaker identification[J].Speech Communication, 2008,50(4):312-322.[10] Martin A F, Le A N. NIST 2007 language recognition evaluation[C]//Odyssey 2008:The Speaker and Language Recognition Workshop. Stellenbosch, South Africa:IEEE, 2008:16.[11] Matejka P, Schwarz P, Cernocký J, et al. Phonotactic language identification using high quality phoneme recognition[C]//9th European Conference on Speech Communication and Technology. Lisbon, Portugal, 2005:2237-2240.[12] Diez M, Varona A, Penagarikano M, et al. Dimensionality reduction of phone log-likelihood ratio features for spoken language recognition[C]//Conference of the InternationalSpeech Communication Association. Lyon,France, 2013:64-68.[13] D'Haro L F, Cordoba R, Salamea C, et al. Extended phone log-likelihood ratio features and acoustic-basedi-vectors for language recognition[C]. International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada, 2014:5342-5346.[14] 王宪亮, 吴志刚, 杨金超, 等. 基于SVM一对一分类的语种识别方法[J]. 清华大学学报(自然科学版), 2013,53(6):808-812. WANG Xianliang, WU Zhigang, YANG Jinchao, et al. Language recognition based on SVM 1 vs. 1 classification[J].J Tsinghua Univ (Sci & Tech), 2013,53(6):808-812. (in Chinese)
文章导航

/