Weighted phone log-likelihood ratio feature for spoken language recognition

ZHANG Jian, XU Jie, BAO Xiuguo, ZHOU Ruohua, YAN Yonghong

Journal of Tsinghua University(Science and Technology) ›› 2017, Vol. 57 ›› Issue (10) : 1038-1041,1047.

PDF(1050 KB)
PDF(1050 KB)
Journal of Tsinghua University(Science and Technology) ›› 2017, Vol. 57 ›› Issue (10) : 1038-1041,1047. DOI: 10.16511/j.cnki.qhdxxb.2017.25.042
COMPUTER SCIENCE AND TECHNOLOGY

Weighted phone log-likelihood ratio feature for spoken language recognition

  • {{article.zuoZhe_EN}}
Author information +
History +

Abstract

The extraction of linguistic discriminative features is one of the fundamental issues in spoken language recognition (SLR). The frame level phone log-likelihood ratio (PLLR) has been recently introduced to improve language recognition. In this paper, the F-ratio analysis method is used to analyze the contributions of different SLR feature vector dimensions. Then, a weighted phone log-likelihood ratio (WPLLR) feature is used to more heavily weight those dimensions with high F-ratio values. Tests on the National Institute of Standards and Technology (NIST) 2007 dataset for SLR show the effectiveness of this feature, with significant relative improvements in the average cost performance and equal error rate compared with the PLLR feature.

Key words

speech signal processing / spoken language recognition / linguistic discrimination / weighted phone log-likelihood ratio (WPLLR) / F-ratio

Cite this article

Download Citations
ZHANG Jian, XU Jie, BAO Xiuguo, ZHOU Ruohua, YAN Yonghong. Weighted phone log-likelihood ratio feature for spoken language recognition[J]. Journal of Tsinghua University(Science and Technology). 2017, 57(10): 1038-1041,1047 https://doi.org/10.16511/j.cnki.qhdxxb.2017.25.042

References

[1] Li H, Ma B, Lee K. Spoken language recognition:From fundamentals to practice[J]. Proceedings of the IEEE, 2013, 101(5):1136-1159.[2] Torres-Carrasquillo P, Singer E, Kohler M, et al. Approaches to language identification using Gaussian mixture models and shifted delta cepstral features[C]//7th International Conference on Spoken Language Processing. Denver, CO, USA:IEEE, 2002:89-92.[3] Reynolds D A, Quatieri T F, Dunn R B. Speaker verification using adapted Gaussian mixture models[J]. Digital Signal Process, 2000, 10(1-3):19-41.[4] Dehak N, Torres-Carrasquillo P A, Reynolds D A, et al. Language recognition via i-vectors and dimensionality reduction[C]//12th Annual Conference of the International Speech Communication Association. Florence, Italy, 2011:857-860.[5] Campbell W M, Sturim D E, Reynolds D A. Support vector machines using GMM supervectors for speaker verification[J]. IEEE Signal Process Letters, 2006, 13(5):308-311.[6] Yan Y, Barnard E. An approach to automatic language identification based on language-dependent phone recognition[C]//IEEE International Conference on Acoustics, Speech, and Signal Processing. Detroit, MI, USA:IEEE, 1995:3511-3514.[7] Li H, Ma B, Lee C. A vector space modeling approach to spoken language identification[J].IEEE Transactions on Audio, Speech, and Language Processing, 2007,15(1):271-284.[8] Diez M, Varona A, Penagarikano M, et al. On the use of phone log-likelihood ratios as features in spoken language recognition[C]//2012 IEEE Spoken Language Technology Workshop (SLT). Miami, FL, USA:IEEE, 2012:274-279.[9] LU Xugang, DANG Jianwu. An investigation of dependencies between frequency components and speaker characteristics for text-independent speaker identification[J].Speech Communication, 2008,50(4):312-322.[10] Martin A F, Le A N. NIST 2007 language recognition evaluation[C]//Odyssey 2008:The Speaker and Language Recognition Workshop. Stellenbosch, South Africa:IEEE, 2008:16.[11] Matejka P, Schwarz P, Cernocký J, et al. Phonotactic language identification using high quality phoneme recognition[C]//9th European Conference on Speech Communication and Technology. Lisbon, Portugal, 2005:2237-2240.[12] Diez M, Varona A, Penagarikano M, et al. Dimensionality reduction of phone log-likelihood ratio features for spoken language recognition[C]//Conference of the InternationalSpeech Communication Association. Lyon,France, 2013:64-68.[13] D'Haro L F, Cordoba R, Salamea C, et al. Extended phone log-likelihood ratio features and acoustic-basedi-vectors for language recognition[C]. International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada, 2014:5342-5346.[14] 王宪亮, 吴志刚, 杨金超, 等. 基于SVM一对一分类的语种识别方法[J]. 清华大学学报(自然科学版), 2013,53(6):808-812. WANG Xianliang, WU Zhigang, YANG Jinchao, et al. Language recognition based on SVM 1 vs. 1 classification[J].J Tsinghua Univ (Sci & Tech), 2013,53(6):808-812. (in Chinese)
PDF(1050 KB)

Accesses

Citation

Detail

Sections
Recommended

/