Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  百年期刊
Journal of Tsinghua University(Science and Technology)    2016, Vol. 56 Issue (11) : 1143-1148     DOI: 10.16511/j.cnki.qhdxxb.2016.26.002
ELECTRONIC ENGINEERING |
Speaker recognition system based on deep neural networks and bottleneck features
TIAN Yao, CAI Meng, HE Liang, LIU Jia
Tsinghua National Laboratory for Information Science and Technology, Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
Download: PDF(1083 KB)  
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks    
Abstract  A hybrid model combining the deep neural network (DNN) for speech recognition and the i-vector model for speaker recognition has been shown effective for speaker recognition. The system performance is further improved by using the DNN with speaker labels to extract bottleneck features to replace the original short-term spectral features for statistics extractions to make the statistics contain more speaker-specific information to improve the speaker recognition. Tests on the NIST SRE 2008 female telephone-telephone-English task demonstrate the effectiveness of this method. The relative improvements of the bottleneck features are 7.65% for the equal error rate(EER) and 5.71% for the minium detection function(minDCF) compared with the short-term spectral features.
Keywords speaker recognition      deep neural network      Bottleneck features     
ZTFLH:  TP391.4  
Issue Date: 15 November 2016
Service
E-mail this article
E-mail Alert
RSS
Articles by authors
TIAN Yao
CAI Meng
HE Liang
LIU Jia
Cite this article:   
TIAN Yao,CAI Meng,HE Liang, et al. Speaker recognition system based on deep neural networks and bottleneck features[J]. Journal of Tsinghua University(Science and Technology), 2016, 56(11): 1143-1148.
URL:  
http://jst.tsinghuajournals.com/EN/10.16511/j.cnki.qhdxxb.2016.26.002     OR     http://jst.tsinghuajournals.com/EN/Y2016/V56/I11/1143
  
  
  
  
  
[1] Kinnunen T, Li H. An overview of text-independent speaker recognition:From features to supervectors[J]. Speech Communication, 2010, 52(1):12-40.
[2] Dehak N, Kenny P, Dehak R, et al. Front-end factor analysis for speaker verification[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(4):788-798.
[3] Hinton G, Deng L, Yu D, et al. Deep neural networks for acoustic modeling in speech recognition:The shared views of four research groups[J]. IEEE Signal Processing Magazine, 2012, 29(6):82-97.
[4] Dahl G E, Yu D, Deng L, et al. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(1):30-42.
[5] Yaman S, Pelecanos J, Sarikaya R. Bottleneck features for speaker recognition[C]//Proceedings on Odyssey. Singapore:International Speech Communication Association, 2012:105-108.
[6] Variani E, Lei X, McDermott E, et al. Deep neural networks for small footprint text-dependent speaker verification[C]//Proceedings on ICASSP. Florence, Italy:IEEE Press, 2014:4052-4056.
[7] Ghahabi O, Hernando J. i-Vector modeling with deep belief networks for multi-session speaker recognition[J]. Network, 2014, 20:13.
[8] Lei Y, Scheffer N, Ferrer L, et al. A novel scheme for speaker recognition using a phonetically-aware deep neural network[C]//Proceedings on ICASSP. Florence, Italy:IEEE Press, 2014:1695-1699.
[9] Bengio Y. Learning deep architectures for AI[J]. Foundations and Trends in Machine Learning, 2009, 2(1):1-127.
[10] Hinton G, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets[J]. Neural Computation, 2006, 18(7):1527-1554.
[11] Garcia-Romero D, Espy-Wilson C Y. Analysis of i-vector length normalization in speaker recognition systems[C]//Proceedigs on Interspeech. Florence, Italy:International Speech Communication Association, 2011:249-252.
[12] Prince S J D, Elder J H. Probabilistic linear discriminant analysis for inferences about identity[C]//Proceedings on ICCV. Rio de Janeiro, Brazil:IEEE Press, 2007:1-8.
[13] Taigman Y, Yang M, Ranzato M A, et al. Deepface:Closing the gap to human-level performance in face verification[C]//Proceedings on CVPR. Columbus, OH, USA:IEEE Press, 2014:1701-1708
[1] WANG Wenguan, CHEN Yunwen, CAI Hua, ZENG Yanneng, YANG Huiyu. Judicial document intellectual processing using hybrid deep neural networks[J]. Journal of Tsinghua University(Science and Technology), 2019, 59(7): 505-511.
[2] WANG Xiaoming, ZHAO Xinbo. Eye movement prediction of individuals while reading based on deep neural networks[J]. Journal of Tsinghua University(Science and Technology), 2019, 59(6): 468-475.
[3] ZHANG Xueying, NIU Puhua, GAO Fan. DNN-LSTM based VAD algorithm[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(5): 509-515.
[4] AISIKAER Rouzi, WANG Dong, LI Lantian, ZHENG Fang, ZHANG Xiaodong, JIN Panshi. Score domain speaking rate normalization for speaker recognition[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(4): 337-341.
[5] YI Jiangyan, TAO Jianhua, LIU Bin, WEN Zhengqi. Transfer learning for acoustic modeling of noise robust speech recognition[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(1): 55-60.
[6] Aisikaer Rouzi, YIN Shi, ZHANG Zhiyong, WANG Dong, Askar Hamdulla, ZHENG Fang. THUYG-20: A free Uyghur speech database[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(2): 182-187.
[7] GAO Yingying, ZHU Weibin. Describing and predicting affective messages for expressive speech synthesis[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(2): 202-207.
[8] YANG Yingchun, DENG Licai. Score regulation based on GMM token ratio similarity for speaker recognition[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(1): 28-32.
[9] XING Anhao, ZHANG Pengyuan, PAN Jielin, YAN Yonghong. SVD-based DNN pruning and retraining[J]. Journal of Tsinghua University(Science and Technology), 2016, 56(7): 772-776.
[10] GUO Wu, MA Xiaokong. Voice activity detection in complex noise environment[J]. Journal of Tsinghua University(Science and Technology), 2016, 56(11): 1190-1195.
[11] ZHANG Jinsong, GAO Yingming, XIE Yanlu. Mispronunciation tendency detection using deep neural networks[J]. Journal of Tsinghua University(Science and Technology), 2016, 56(11): 1220-1225.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
Copyright © Journal of Tsinghua University(Science and Technology), All Rights Reserved.
Powered by Beijing Magtech Co. Ltd