Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  百年期刊
Journal of Tsinghua University(Science and Technology)    2018, Vol. 58 Issue (4) : 337-341     DOI: 10.16511/j.cnki.qhdxxb.2018.25.028
COMPUTER SCIENCE AND TECHNOLOGY |
Score domain speaking rate normalization for speaker recognition
AISIKAER Rouzi1, WANG Dong1, LI Lantian1, ZHENG Fang1, ZHANG Xiaodong2, JIN Panshi2
1. Center for Speech and Language Technologies, Division of Technical Innovation and Development, Tsinghua National Laboratory for Information Science and Technology;Center for Speech and Language Technologies, Research Institute of Information Technology, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China;
2. Information Technology Management Department, China Construction Bank, Beijing 100000, China
Download: PDF(978 KB)  
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks    
Abstract  Speaking rate variations seriously degrade speaker recognition accuracy. This paper presents a normalization approach in the score domain that reduces the impact of speaking rate variations. The score distributions for each type of imposter in the cohort set (global and local sets which consist of speech utterances at different speaking rates) are computed against each enrolled speaker with the local cohort set obtained by splitting the utterances in the global cohort set according to the relative speaking rates. The scores for the test speech are normalized based on a self-recorded speaking rate database using a GMM-UBM (Gaussian mixture model-universal background model) framework with the data sparsity problem handled by augmenting the training data with a final relative EER (equal error rate) reduction of 33.33%. This study shows that global and local score normalization methods effectively reduce the impact of speaking rate variations on speaker recognition.
Keywords speaker recognition      score domain      speaking rate normalization      relative speaking rate      GMM-UBM     
ZTFLH:  TP391.4  
Issue Date: 15 April 2018
Service
E-mail this article
E-mail Alert
RSS
Articles by authors
Cite this article:   
AISIKAER Rouzi, WANG Dong, LI Lantian, ZHENG Fang, ZHANG Xiaodong, JIN Panshi. Score domain speaking rate normalization for speaker recognition[J]. Journal of Tsinghua University(Science and Technology),2018, 58(4): 337-341.
URL:  
http://jst.tsinghuajournals.com/EN/10.16511/j.cnki.qhdxxb.2018.25.028     OR     http://jst.tsinghuajournals.com/EN/Y2018/V58/I4/337
  
  
  
  
  
[1] CAMPBELL W M, CAMPBELL J P, REYNOLDS D A, et al. Support vector machines for speaker and language recognition[J]. Computer Speech & Language, 2006, 20(2):210-229.
[2] BIMBOT F, BONASTRE J F, FREDOUILLE C. A tutorial on text-independent speaker verification[J]. EURASIP Journal on Applied Signal Processing, 2004(1):430-451.
[3] CHU M S, POVEY D. Speaking rate adaptation using continuous frame rate normalization[C]//Proceedings of the International Conference on Acoustics, Speech and Signal Processing. Dallas, TX,[1] CAMPBELL W M, CAMPBELL J P, REYNOLDS D A, et al. Support vector machines for speaker and language recognition[J]. Computer Speech & Language, 2006, 20(2):210-229.
[2] BIMBOT F, BONASTRE J F, FREDOUILLE C. A tutorial on text-independent speaker verification[J]. EURASIP Journal on Applied Signal Processing, 2004(1):430-451.
[3] CHU M S, POVEY D. Speaking rate adaptation using continuous frame rate normalization[C]//Proceedings of the International Conference on Acoustics, Speech and Signal Processing. Dallas, TX, USA:IEEE, 2010:4306-4309.
[4] XU M X, ZHANG L P, WANG L L. Database collection for study on speech variation robust speaker recognition[C]//Proceedings of the Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques. Kyoto, Japan:IEEE, 2008.
[5] MARCO G, CUMMINS F. Speech style and speaker recognition:A case study[C]//Proceedings of the Interspeech. Brighton, UK:IEEE, 2009.
[6] ASKAR R, LI L T, WANG D, et al. Feature transformation for speaker verification under speaking rate mismatch condition[C]//Proceedings of the Asia-Pacific Signal and Information Processing Association. Jeju, Korea:IEEE, 2016.
[7] VAN HEERDEN C J, BARNARD E. Speech rate normalization used to improve speaker verification[J]. SAIEE Africa Research Journal, 2007, 98(4):129-135.
[8] BEIGI H. Fundamentals of speaker recognition[M]. New York, USA:Springer, 2011.
[9] MAATEN L, HINTON G. Visualizing data using t-SNE[J]. Journal of Machine Learning Research, 2008, 9:2579-2605.
[10] van der MAATEN L, HINTON G. Visualizing non-metric similarities in multiple maps[J]. Machine Learning, 2012, 87(1):33-55.
[11] CUMMINS F, GRIMALDI M, LEONARD T, et al. The chains corpus:Characterizing individual speakers[C]//Proceedings of the International Conference on Speech and Computer (SPECOM), St. Petersburg, Russia:Springer, 2006:431-435.
[12] POVEY D, GHOSHAL A, BOULIANNE G, et al. The KALDI speech recognition toolkit[C]//Proceedings of the Automatic Speech Recognition and Understanding (ASRU). Hawaii, HI, USA:IEEE, 2011. USA:IEEE, 2010:4306-4309.
[4] XU M X, ZHANG L P, WANG L L. Database collection for study on speech variation robust speaker recognition[C]//Proceedings of the Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques. Kyoto, Japan:IEEE, 2008.
[5] MARCO G, CUMMINS F. Speech style and speaker recognition:A case study[C]//Proceedings of the Interspeech. Brighton, UK:IEEE, 2009.
[6] ASKAR R, LI L T, WANG D, et al. Feature transformation for speaker verification under speaking rate mismatch condition[C]//Proceedings of the Asia-Pacific Signal and Information Processing Association. Jeju, Korea:IEEE, 2016.
[7] VAN HEERDEN C J, BARNARD E. Speech rate normalization used to improve speaker verification[J]. SAIEE Africa Research Journal, 2007, 98(4):129-135.
[8] BEIGI H. Fundamentals of speaker recognition[M]. New York, USA:Springer, 2011.
[9] MAATEN L, HINTON G. Visualizing data using t-SNE[J]. Journal of Machine Learning Research, 2008, 9:2579-2605.
[10] van der MAATEN L, HINTON G. Visualizing non-metric similarities in multiple maps[J]. Machine Learning, 2012, 87(1):33-55.
[11] CUMMINS F, GRIMALDI M, LEONARD T, et al. The chains corpus:Characterizing individual speakers[C]//Proceedings of the International Conference on Speech and Computer (SPECOM), St. Petersburg, Russia:Springer, 2006:431-435.
[12] POVEY D, GHOSHAL A, BOULIANNE G, et al. The KALDI speech recognition toolkit[C]//Proceedings of the Automatic Speech Recognition and Understanding (ASRU). Hawaii, HI, USA:IEEE, 2011.
[1] YANG Yingchun, DENG Licai. Score regulation based on GMM token ratio similarity for speaker recognition[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(1): 28-32.
[2] TIAN Yao, CAI Meng, HE Liang, LIU Jia. Speaker recognition system based on deep neural networks and bottleneck features[J]. Journal of Tsinghua University(Science and Technology), 2016, 56(11): 1143-1148.
[3] GUO Wu, MA Xiaokong. Voice activity detection in complex noise environment[J]. Journal of Tsinghua University(Science and Technology), 2016, 56(11): 1190-1195.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
Copyright © Journal of Tsinghua University(Science and Technology), All Rights Reserved.
Powered by Beijing Magtech Co. Ltd