Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  百年期刊
Journal of Tsinghua University(Science and Technology)    2016, Vol. 56 Issue (11) : 1190-1195     DOI: 10.16511/j.cnki.qhdxxb.2016.26.010
ELECTRONIC ENGINEERING |
Voice activity detection in complex noise environment
GUO Wu, MA Xiaokong
National Engineering Laboratory for Speech and Language Information Processing, School of Science and Technology, University of Science and Technology of China, Hefei 230027, China
Download: PDF(1031 KB)  
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks    
Abstract  A voice activity detection (VAD) algorithm was developed for robust voice detection in complex noise conditions. The energy, the most dominant component and the spectral entropy are used to form three dimensional features that have been demonstrated to strongly complement each of them in the presence of complex noise. The K-mean algorithm is used to adaptively select the feature and to calculate the utterance dependent thresholds, which are applied in the following speech detection process. Tests on the NIST SRE 2008 and 2012 corpus show that this algorithm gives better performance for different noise conditions and is more robust and efficient than conventional unsupervised and supervised methods.
Keywords speaker recognition      voice activity detection      spectral entropy      K-mean     
ZTFLH:  TN912.34  
Issue Date: 15 November 2016
Service
E-mail this article
E-mail Alert
RSS
Articles by authors
GUO Wu
MA Xiaokong
Cite this article:   
GUO Wu,MA Xiaokong. Voice activity detection in complex noise environment[J]. Journal of Tsinghua University(Science and Technology), 2016, 56(11): 1190-1195.
URL:  
http://jst.tsinghuajournals.com/EN/10.16511/j.cnki.qhdxxb.2016.26.010     OR     http://jst.tsinghuajournals.com/EN/Y2016/V56/I11/1190
  
  
  
  
  
[1] Alam J, Kenny P, Ouellet P, et al. Supervised/unsupervised voice activity detectors for text dependent speaker recognition on the RSR2015 corpus[C]//Proc of Speaker Odyssey 2014, Joensuu, Finland, 2014:123-130.
[2] Ferrer L, McLaren M, Scheffer N, et al. A Noise-robust system for NIST 2012 speaker recognition evaluation[C]//Proc of Interspeech 2013, Lyon, France:International Speech and Communication Association, 2013:1981-1984.
[3] Colibro D, Vair C, Farrell K, et al. Nuance-Politecnico di Torino's 2012 NIST speaker recognition evaluation system[C]//Proc of Interspeech 2013, Lyon, France:International Speech and Communication Association, 2013:1996-2000.
url: http://dx.doi.org/e-Politecnico di Torino
[4] Lamel L, Rabiner LR, Rosenberg A, et al. An improved endpoint detector for isolated word recognition[J]. IEEE Trans on Acoustics, Speech, and Signal processing, 1981, 29(4):777-785.
[5] Morales-Cordovilla J A, Ma N, Sanchez V, et al. A pitch based noise estimation technique for robust speech recognition with missing data[C]//Proc of ICASSP 2011, Prague, Czech republic:Institute of Electrical and Electronics Engineers Inc, 2011:4808-4811.
[6] Renevey P, Drygajlo A. Entropy based voice activity detection in very noisy conditions[C]//Proc of Eurospeech 2001, Cape Town, South Africa:Institute of Electrical and Electronics Engineers Inc, 2001:1887-1890.
[7] Moattar MH, Homayounpour M M. A simple but efficient real-time voice activity detection algorithm[C]//Proc of EUSIPCO 2009, Glasgow, United Kingdom:European Signal Processing Conference, EUSIPCO, 2009:2549-2553.
[8] Li Q, Zheng J, Tsai A. et al. Robust endpoint detection and energy normalization for real-time speech and speaker recognition[J]. IEEE Trans on Speech & Audio Processing, 2002, 10(3):146-157.
url: http://dx.doi.org/Trans on Speech
[9] Kinnunen T, Rajan, P. A practical, self-adaptive voice activity detector for speaker verification with noisy telephone and microphone data[C]//Proc of ICASSP 2013, Vancouver, BC, Canada:Institute of Electrical and Electronics Engineers Inc, 2013:7229-7233.
[10] Yu H B, Mak M W. Comparison of voice activity detectors for interview speech in NIST speaker recognition evaluation[C]//Proc of Interspeech 2011, Florence, Italy:International Speech and Communication Association, 2011:7229-7233.
[11] NIST. The NIST year 2008 speaker recognition evaluation plan[EB/OL].[2008-04-02]. http://www.itl.nist.gov/iad/mig/tests/sre/2008/sre08_evalplan_release4.pdf.
url: http://www.itl.nist.gov/iad/mig/tests/sre/2008/sre08_evalplan_release4.pdf.
[12] NIST. The NIST Year 2012 Speaker Recognition Evaluation Plan[EB/OL].[2012-05-30]. http://www.nist.gov/itl/iad/mig/upload/NIST_SRE12_evalplan-v17-r1.pdf.
url: http://www.nist.gov/itl/iad/mig/upload/nist_sre12_evalplan-v17-r1.pdf.
[13] Guo W, Long Y H, Li Y J, et al. iFLY system for the NIST 2008 speaker recognition evaluation[C]//Proc of ICASSP 2009, Taipei, China:Institute of Electrical and Electronics Engineers Inc, 2009:4209-4212.
[14] Rahim S, Lee K A, Tomi K, et al. I4U submission to NIST SRE 2012:A large-scale collaborative effort for noise-robust speaker verification[C]//Proc of Interspeech 2013, Lyon, France:International Speech and Communication Association, 2013:1986-1990.
[1] ZHANG Xueying, NIU Puhua, GAO Fan. DNN-LSTM based VAD algorithm[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(5): 509-515.
[2] AISIKAER Rouzi, WANG Dong, LI Lantian, ZHENG Fang, ZHANG Xiaodong, JIN Panshi. Score domain speaking rate normalization for speaker recognition[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(4): 337-341.
[3] JIA Fan, YAN Yan, ZHANG Jiaqi. K-means based feature reduction for network anomaly detection[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(2): 137-142.
[4] YANG Yingchun, DENG Licai. Score regulation based on GMM token ratio similarity for speaker recognition[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(1): 28-32.
[5] LI Xu, TU Ming, WU Chao, GUO Yanmeng, NA Yueyue, FU Qiang, YAN Yonghong. Single-channel speech separation with non-negative matrix factorization and factorial conditional random fields[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(1): 84-88.
[6] TIAN Yao, CAI Meng, HE Liang, LIU Jia. Speaker recognition system based on deep neural networks and bottleneck features[J]. Journal of Tsinghua University(Science and Technology), 2016, 56(11): 1143-1148.
[7] CHEN Yuanlin, CHAI Yueting, LIU Yi, XU Yang. Transaction rating credibility based on user group preference[J]. Journal of Tsinghua University(Science and Technology), 2015, 55(5): 558-564,571.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
Copyright © Journal of Tsinghua University(Science and Technology), All Rights Reserved.
Powered by Beijing Magtech Co. Ltd