The extraction of linguistic discriminative features is one of the fundamental issues in spoken language recognition (SLR). The frame level phone log-likelihood ratio (PLLR) has been recently introduced to improve language recognition. In this paper, the F-ratio analysis method is used to analyze the contributions of different SLR feature vector dimensions. Then, a weighted phone log-likelihood ratio (WPLLR) feature is used to more heavily weight those dimensions with high F-ratio values. Tests on the National Institute of Standards and Technology (NIST) 2007 dataset for SLR show the effectiveness of this feature, with significant relative improvements in the average cost performance and equal error rate compared with the PLLR feature.
ZHANG Jian
,
XU Jie
,
BAO Xiuguo
,
ZHOU Ruohua
,
YAN Yonghong
. Weighted phone log-likelihood ratio feature for spoken language recognition[J]. Journal of Tsinghua University(Science and Technology), 2017
, 57(10)
: 1038
-1041,1047
.
DOI: 10.16511/j.cnki.qhdxxb.2017.25.042
[1] Li H, Ma B, Lee K. Spoken language recognition:From fundamentals to practice[J]. Proceedings of the IEEE, 2013, 101(5):1136-1159.[2] Torres-Carrasquillo P, Singer E, Kohler M, et al. Approaches to language identification using Gaussian mixture models and shifted delta cepstral features[C]//7th International Conference on Spoken Language Processing. Denver, CO, USA:IEEE, 2002:89-92.[3] Reynolds D A, Quatieri T F, Dunn R B. Speaker verification using adapted Gaussian mixture models[J]. Digital Signal Process, 2000, 10(1-3):19-41.[4] Dehak N, Torres-Carrasquillo P A, Reynolds D A, et al. Language recognition via i-vectors and dimensionality reduction[C]//12th Annual Conference of the International Speech Communication Association. Florence, Italy, 2011:857-860.[5] Campbell W M, Sturim D E, Reynolds D A. Support vector machines using GMM supervectors for speaker verification[J]. IEEE Signal Process Letters, 2006, 13(5):308-311.[6] Yan Y, Barnard E. An approach to automatic language identification based on language-dependent phone recognition[C]//IEEE International Conference on Acoustics, Speech, and Signal Processing. Detroit, MI, USA:IEEE, 1995:3511-3514.[7] Li H, Ma B, Lee C. A vector space modeling approach to spoken language identification[J].IEEE Transactions on Audio, Speech, and Language Processing, 2007,15(1):271-284.[8] Diez M, Varona A, Penagarikano M, et al. On the use of phone log-likelihood ratios as features in spoken language recognition[C]//2012 IEEE Spoken Language Technology Workshop (SLT). Miami, FL, USA:IEEE, 2012:274-279.[9] LU Xugang, DANG Jianwu. An investigation of dependencies between frequency components and speaker characteristics for text-independent speaker identification[J].Speech Communication, 2008,50(4):312-322.[10] Martin A F, Le A N. NIST 2007 language recognition evaluation[C]//Odyssey 2008:The Speaker and Language Recognition Workshop. Stellenbosch, South Africa:IEEE, 2008:16.[11] Matejka P, Schwarz P, Cernocký J, et al. Phonotactic language identification using high quality phoneme recognition[C]//9th European Conference on Speech Communication and Technology. Lisbon, Portugal, 2005:2237-2240.[12] Diez M, Varona A, Penagarikano M, et al. Dimensionality reduction of phone log-likelihood ratio features for spoken language recognition[C]//Conference of the InternationalSpeech Communication Association. Lyon,France, 2013:64-68.[13] D'Haro L F, Cordoba R, Salamea C, et al. Extended phone log-likelihood ratio features and acoustic-basedi-vectors for language recognition[C]. International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada, 2014:5342-5346.[14] 王宪亮, 吴志刚, 杨金超, 等. 基于SVM一对一分类的语种识别方法[J]. 清华大学学报(自然科学版), 2013,53(6):808-812. WANG Xianliang, WU Zhigang, YANG Jinchao, et al. Language recognition based on SVM 1 vs. 1 classification[J].J Tsinghua Univ (Sci & Tech), 2013,53(6):808-812. (in Chinese)