COMPUTER SCIENCE AND TECHNOLOGY |
|
|
|
|
|
Score regulation based on GMM token ratio similarity for speaker recognition |
YANG Yingchun, DENG Licai |
College of Computer Science & Technology, Zhejiang University, Hangzhou 310027, China |
|
|
Abstract A GMM token ratio similarity based score regulation approach for speaker recognition is presented in this paper to judge the reliability of a test score based on the GMM token ratio similarity. In the GMM-UBM (universal background model) method, the GMM token which is the index of the UBM component giving the highest score is saved for each frame to form a vector called the GMM token ratio (GTR) of an utterance during the training and testing phases. In the test phase, the test utterance GTR is compared to the training utterance GTR to compute the similarity for a target speaker. When the similarity is less than a threshold, the original likelihood score is regulated by multiplying by a penalty factor as the final score of this test utterance. Tests on MASC show that this method improves the speaker recognition performance.
|
Keywords
speaker recognition
GMM token ratio (GTR)
score regulation
|
|
Issue Date: 15 January 2017
|
|
|
[7] |
Torres-Carrasquillo P, Reynolds D. Language identification using Gaussian mixture model tokenization[C]//Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. Orlando, FL, USA:IEEE Press, 2002:757-760.<br />
|
[8] |
顾明亮, 沈兆勇.基于语音配列的汉语方言自动辨识[J]. 中文信息学报, 2006, 20(5):77-82.GU Mingliang, SHEN Zhaoyong. Phonotatics based Chinese dialects identification[J]. Journal of Chinese Information Processing, 2006, 20(5):77-82. (in Chinese)<br />
|
[1] |
Reynolds D. A Gaussian Mixture Modeling Approach to Text-Independent Speaker Identification[D]. Atlanta, GA, USA:Georgia Institute of Technology, 1992.
|
[2] |
Reynolds D. Speaker verification using adapted Gaussian mixture models[J]. Digital Signal Processing, 2000, 10(1):19-41.
|
[9] |
MA Bin, ZHU Donglai, TONG Rong, et al. Speaker cluster based on GMM Tokenization for speaker recognition[C]//Proceedings of Interspeech, Pittsburgh, PA, USA, 2006:505-508.<br />
|
[3] |
吴朝晖, 杨莹春. 说话人识别模型与方法[M]. 北京:清华大学出版社, 2009.WU Zhaohui, YANG Yingchun. Speaker Recognition:Models and Methods[M]. Beijing:Tsinghua University Press, 2009. (in Chinese)
|
[4] |
Tomi K, LI Haizhou. An overview of text-independent speaker recognition:From features to supervectors[J]. Speech Communication, 2010, 52(1):12-40.
|
[10] |
TONG Rong, MA Bin, LEE Kong-Aik, et al. Fusion of acoustic and tokenization features for speaker recognition[C]//Proceedings of the 5th International Symposium on Chinese Spoken Language Processing. Kentridge, Singapore:Springer Press, 2006:566-577.<br />
|
[5] |
XIANG Bing. Text-independent speaker verification with dynamic trajectory model[J]. IEEE Signal Processing Letters, 2003, 10(5):141-142.
|
[11] |
邓立才. GMM说话人建模的关键问题研究[D]. 杭州:浙江大学, 2014.DENG Licai, Research on Key Problems of GMM Speaker Modeling[D]. Hanzghou:Zhejiang University, 2014. (in Chinese)<br />
|
[6] |
Zissman M. Comparison of four approaches to automatic language identification of telephone speech[J]. IEEE Transaction on Speech and Audio Processing, 1996, 4(1):31-44.
|
[12] |
WU Tian, YANG Yingchun, WU Zhaohui, et al. MASC:A speech corpus in Mandarin for emotion analysis and affective speaker recognition[C]//Proceedings of IEEE Odyssey Speaker and Language Recognition Workshop, Puerto Rico:IEEE Press, 2006:1-5.
|
[7] |
Torres-Carrasquillo P, Reynolds D. Language identification using Gaussian mixture model tokenization[C]//Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. Orlando, FL, USA:IEEE Press, 2002:757-760.
|
[8] |
顾明亮, 沈兆勇.基于语音配列的汉语方言自动辨识[J]. 中文信息学报, 2006, 20(5):77-82.GU Mingliang, SHEN Zhaoyong. Phonotatics based Chinese dialects identification[J]. Journal of Chinese Information Processing, 2006, 20(5):77-82. (in Chinese)
|
[9] |
MA Bin, ZHU Donglai, TONG Rong, et al. Speaker cluster based on GMM Tokenization for speaker recognition[C]//Proceedings of Interspeech, Pittsburgh, PA, USA, 2006:505-508.
|
[10] |
TONG Rong, MA Bin, LEE Kong-Aik, et al. Fusion of acoustic and tokenization features for speaker recognition[C]//Proceedings of the 5th International Symposium on Chinese Spoken Language Processing. Kentridge, Singapore:Springer Press, 2006:566-577.
|
[11] |
邓立才. GMM说话人建模的关键问题研究[D]. 杭州:浙江大学, 2014.DENG Licai, Research on Key Problems of GMM Speaker Modeling[D]. Hanzghou:Zhejiang University, 2014. (in Chinese)
|
[12] |
WU Tian, YANG Yingchun, WU Zhaohui, et al. MASC:A speech corpus in Mandarin for emotion analysis and affective speaker recognition[C]//Proceedings of IEEE Odyssey Speaker and Language Recognition Workshop, Puerto Rico:IEEE Press, 2006:1-5.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|