COMPUTER SCIENCE AND TECHNOLOGY |
|
|
|
|
|
Weighted phone log-likelihood ratio feature for spoken language recognition |
ZHANG Jian1, XU Jie2, BAO Xiuguo2, ZHOU Ruohua1, YAN Yonghong1 |
1. Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China;
2. National Computer Network Emergency Response Technical Team Coordination Center of China, Beijing 100029, China |
|
|
Abstract The extraction of linguistic discriminative features is one of the fundamental issues in spoken language recognition (SLR). The frame level phone log-likelihood ratio (PLLR) has been recently introduced to improve language recognition. In this paper, the F-ratio analysis method is used to analyze the contributions of different SLR feature vector dimensions. Then, a weighted phone log-likelihood ratio (WPLLR) feature is used to more heavily weight those dimensions with high F-ratio values. Tests on the National Institute of Standards and Technology (NIST) 2007 dataset for SLR show the effectiveness of this feature, with significant relative improvements in the average cost performance and equal error rate compared with the PLLR feature.
|
Keywords
speech signal processing
spoken language recognition
linguistic discrimination
weighted phone log-likelihood ratio (WPLLR)
F-ratio
|
|
Issue Date: 15 October 2017
|
|
|
[1] |
Li H, Ma B, Lee K. Spoken language recognition:From fundamentals to practice[J]. Proceedings of the IEEE, 2013, 101(5):1136-1159.
|
[2] |
Torres-Carrasquillo P, Singer E, Kohler M, et al. Approaches to language identification using Gaussian mixture models and shifted delta cepstral features[C]//7th International Conference on Spoken Language Processing. Denver, CO, USA:IEEE, 2002:89-92.
|
[3] |
Reynolds D A, Quatieri T F, Dunn R B. Speaker verification using adapted Gaussian mixture models[J]. Digital Signal Process, 2000, 10(1-3):19-41.
|
[4] |
Dehak N, Torres-Carrasquillo P A, Reynolds D A, et al. Language recognition via i-vectors and dimensionality reduction[C]//12th Annual Conference of the International Speech Communication Association. Florence, Italy, 2011:857-860.
|
[5] |
Campbell W M, Sturim D E, Reynolds D A. Support vector machines using GMM supervectors for speaker verification[J]. IEEE Signal Process Letters, 2006, 13(5):308-311.
|
[6] |
Yan Y, Barnard E. An approach to automatic language identification based on language-dependent phone recognition[C]//IEEE International Conference on Acoustics, Speech, and Signal Processing. Detroit, MI, USA:IEEE, 1995:3511-3514.
|
[7] |
Li H, Ma B, Lee C. A vector space modeling approach to spoken language identification[J].IEEE Transactions on Audio, Speech, and Language Processing, 2007,15(1):271-284.
|
[8] |
Diez M, Varona A, Penagarikano M, et al. On the use of phone log-likelihood ratios as features in spoken language recognition[C]//2012 IEEE Spoken Language Technology Workshop (SLT). Miami, FL, USA:IEEE, 2012:274-279.
|
[9] |
LU Xugang, DANG Jianwu. An investigation of dependencies between frequency components and speaker characteristics for text-independent speaker identification[J].Speech Communication, 2008,50(4):312-322.
|
[10] |
Martin A F, Le A N. NIST 2007 language recognition evaluation[C]//Odyssey 2008:The Speaker and Language Recognition Workshop. Stellenbosch, South Africa:IEEE, 2008:16.
|
[11] |
Matejka P, Schwarz P, Cernocký J, et al. Phonotactic language identification using high quality phoneme recognition[C]//9th European Conference on Speech Communication and Technology. Lisbon, Portugal, 2005:2237-2240.
|
[12] |
Diez M, Varona A, Penagarikano M, et al. Dimensionality reduction of phone log-likelihood ratio features for spoken language recognition[C]//Conference of the InternationalSpeech Communication Association. Lyon,France, 2013:64-68.
|
[13] |
D'Haro L F, Cordoba R, Salamea C, et al. Extended phone log-likelihood ratio features and acoustic-basedi-vectors for language recognition[C]. International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada, 2014:5342-5346.
|
[14] |
王宪亮, 吴志刚, 杨金超, 等. 基于SVM一对一分类的语种识别方法[J]. 清华大学学报(自然科学版), 2013,53(6):808-812. WANG Xianliang, WU Zhigang, YANG Jinchao, et al. Language recognition based on SVM 1 vs. 1 classification[J].J Tsinghua Univ (Sci & Tech), 2013,53(6):808-812. (in Chinese)
url: http://dx.doi.org/nghua Univ (Sci
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|