Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们 横山亮次奖 百年刊庆
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  横山亮次奖  |  百年刊庆
清华大学学报(自然科学版)  2016, Vol. 56 Issue (11): 1190-1195    DOI: 10.16511/j.cnki.qhdxxb.2016.26.010
  电子工程 本期目录 | 过刊浏览 | 高级检索 |
复杂噪声场景下的活动语音检测方法
郭武, 马啸空
中国科学技术大学 信息科学技术学院, 语音及语言信息处理国家工程实验室, 合肥 230027
Voice activity detection in complex noise environment
GUO Wu, MA Xiaokong
National Engineering Laboratory for Speech and Language Information Processing, School of Science and Technology, University of Science and Technology of China, Hefei 230027, China
全文: PDF(1031 KB)  
输出: BibTeX | EndNote (RIS)      
摘要 该文提出一种适用于各种复杂噪声场景下的鲁棒性活动语音检测方法。采用能量、主频率分量和短时谱熵3种声学参数形成三维特征,这3种参数在各种各样的噪声中表现出很强的互补性;在活动语音脉冲检测中,采用K均值聚类算法自适应地选择特征并且计算语音检测过程中所用到的阈值。在美国国家标准与技术研究院说话人评测2008和2012年任务上进行实验,结果表明:所提出的方法在各种不同噪声环境下均具有较好的性能,相比传统的非监督和有监督活动语音检测算法更加鲁棒高效。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
郭武
马啸空
关键词 说话人识别活动语音检测谱熵K均值聚类    
Abstract:A voice activity detection (VAD) algorithm was developed for robust voice detection in complex noise conditions. The energy, the most dominant component and the spectral entropy are used to form three dimensional features that have been demonstrated to strongly complement each of them in the presence of complex noise. The K-mean algorithm is used to adaptively select the feature and to calculate the utterance dependent thresholds, which are applied in the following speech detection process. Tests on the NIST SRE 2008 and 2012 corpus show that this algorithm gives better performance for different noise conditions and is more robust and efficient than conventional unsupervised and supervised methods.
Key wordsspeaker recognition    voice activity detection    spectral entropy    K-mean
收稿日期: 2016-06-21      出版日期: 2016-11-15
ZTFLH:  TN912.34  
引用本文:   
郭武, 马啸空. 复杂噪声场景下的活动语音检测方法[J]. 清华大学学报(自然科学版), 2016, 56(11): 1190-1195.
GUO Wu, MA Xiaokong. Voice activity detection in complex noise environment. Journal of Tsinghua University(Science and Technology), 2016, 56(11): 1190-1195.
链接本文:  
http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2016.26.010  或          http://jst.tsinghuajournals.com/CN/Y2016/V56/I11/1190
  图1 建议的VAD方法
  图2 能量门限与语音脉冲对应的起始帧和结束帧示意图
  图3 状态转换关系
  表1 NIST SRE 2008实验结果
  表2 NIST SRE 2012实验结果(EER%/MinC12)
[1] Alam J, Kenny P, Ouellet P, et al. Supervised/unsupervised voice activity detectors for text dependent speaker recognition on the RSR2015 corpus[C]//Proc of Speaker Odyssey 2014, Joensuu, Finland, 2014:123-130.
[2] Ferrer L, McLaren M, Scheffer N, et al. A Noise-robust system for NIST 2012 speaker recognition evaluation[C]//Proc of Interspeech 2013, Lyon, France:International Speech and Communication Association, 2013:1981-1984.
[3] Colibro D, Vair C, Farrell K, et al. Nuance-Politecnico di Torino's 2012 NIST speaker recognition evaluation system[C]//Proc of Interspeech 2013, Lyon, France:International Speech and Communication Association, 2013:1996-2000.
[4] Lamel L, Rabiner LR, Rosenberg A, et al. An improved endpoint detector for isolated word recognition[J]. IEEE Trans on Acoustics, Speech, and Signal processing, 1981, 29(4):777-785.
[5] Morales-Cordovilla J A, Ma N, Sanchez V, et al. A pitch based noise estimation technique for robust speech recognition with missing data[C]//Proc of ICASSP 2011, Prague, Czech republic:Institute of Electrical and Electronics Engineers Inc, 2011:4808-4811.
[6] Renevey P, Drygajlo A. Entropy based voice activity detection in very noisy conditions[C]//Proc of Eurospeech 2001, Cape Town, South Africa:Institute of Electrical and Electronics Engineers Inc, 2001:1887-1890.
[7] Moattar MH, Homayounpour M M. A simple but efficient real-time voice activity detection algorithm[C]//Proc of EUSIPCO 2009, Glasgow, United Kingdom:European Signal Processing Conference, EUSIPCO, 2009:2549-2553.
[8] Li Q, Zheng J, Tsai A. et al. Robust endpoint detection and energy normalization for real-time speech and speaker recognition[J]. IEEE Trans on Speech & Audio Processing, 2002, 10(3):146-157.
[9] Kinnunen T, Rajan, P. A practical, self-adaptive voice activity detector for speaker verification with noisy telephone and microphone data[C]//Proc of ICASSP 2013, Vancouver, BC, Canada:Institute of Electrical and Electronics Engineers Inc, 2013:7229-7233.
[10] Yu H B, Mak M W. Comparison of voice activity detectors for interview speech in NIST speaker recognition evaluation[C]//Proc of Interspeech 2011, Florence, Italy:International Speech and Communication Association, 2011:7229-7233.
[11] NIST. The NIST year 2008 speaker recognition evaluation plan[EB/OL].[2008-04-02]. http://www.itl.nist.gov/iad/mig/tests/sre/2008/sre08_evalplan_release4.pdf.
[12] NIST. The NIST Year 2012 Speaker Recognition Evaluation Plan[EB/OL].[2012-05-30]. http://www.nist.gov/itl/iad/mig/upload/NIST_SRE12_evalplan-v17-r1.pdf.
[13] Guo W, Long Y H, Li Y J, et al. iFLY system for the NIST 2008 speaker recognition evaluation[C]//Proc of ICASSP 2009, Taipei, China:Institute of Electrical and Electronics Engineers Inc, 2009:4209-4212.
[14] Rahim S, Lee K A, Tomi K, et al. I4U submission to NIST SRE 2012:A large-scale collaborative effort for noise-robust speaker verification[C]//Proc of Interspeech 2013, Lyon, France:International Speech and Communication Association, 2013:1986-1990.
[1] 艾斯卡尔·肉孜, 王东, 李蓝天, 郑方, 张晓东, 金磐石. 说话人识别中的分数域语速归一化[J]. 清华大学学报(自然科学版), 2018, 58(4): 337-341.
[2] 杨莹春, 邓立才. 基于GMM托肯配比相似度校正得分的说话人识别[J]. 清华大学学报(自然科学版), 2017, 57(1): 28-32.
[3] 李煦, 屠明, 吴超, 国雁萌, 纳跃跃, 付强, 颜永红. 基于NMF和FCRF的单通道语音分离[J]. 清华大学学报(自然科学版), 2017, 57(1): 84-88.
[4] 田垚, 蔡猛, 何亮, 刘加. 基于深度神经网络和Bottleneck特征的说话人识别系统[J]. 清华大学学报(自然科学版), 2016, 56(11): 1143-1148.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 《清华大学学报(自然科学版)》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn