清华大学学报(自然科学版)  2016, Vol. 56 Issue (11): 1190-1195    DOI: 10.16511/j.cnki.qhdxxb.2016.26.010
郭武, 马啸空
中国科学技术大学 信息科学技术学院, 语音及语言信息处理国家工程实验室, 合肥 230027
Voice activity detection in complex noise environment
GUO Wu, MA Xiaokong
National Engineering Laboratory for Speech and Language Information Processing, School of Science and Technology, University of Science and Technology of China, Hefei 230027, China
摘要 该文提出一种适用于各种复杂噪声场景下的鲁棒性活动语音检测方法。采用能量、主频率分量和短时谱熵3种声学参数形成三维特征,这3种参数在各种各样的噪声中表现出很强的互补性;在活动语音脉冲检测中,采用K均值聚类算法自适应地选择特征并且计算语音检测过程中所用到的阈值。在美国国家标准与技术研究院说话人评测2008和2012年任务上进行实验,结果表明:所提出的方法在各种不同噪声环境下均具有较好的性能,相比传统的非监督和有监督活动语音检测算法更加鲁棒高效。
关键词 说话人识别活动语音检测谱熵K均值聚类    
Abstract:A voice activity detection (VAD) algorithm was developed for robust voice detection in complex noise conditions. The energy, the most dominant component and the spectral entropy are used to form three dimensional features that have been demonstrated to strongly complement each of them in the presence of complex noise. The K-mean algorithm is used to adaptively select the feature and to calculate the utterance dependent thresholds, which are applied in the following speech detection process. Tests on the NIST SRE 2008 and 2012 corpus show that this algorithm gives better performance for different noise conditions and is more robust and efficient than conventional unsupervised and supervised methods.
Key wordsspeaker recognition    voice activity detection    spectral entropy    K-mean
收稿日期: 2016-06-21      出版日期: 2016-11-26
ZTFLH:  TN912.34  
郭武, 马啸空. 复杂噪声场景下的活动语音检测方法[J]. 清华大学学报(自然科学版), 2016, 56(11): 1190-1195.
GUO Wu, MA Xiaokong. Voice activity detection in complex noise environment. Journal of Tsinghua University(Science and Technology), 2016, 56(11): 1190-1195.
  图1 建议的VAD方法
  图2 能量门限与语音脉冲对应的起始帧和结束帧示意图
  图3 状态转换关系
  表1 NIST SRE 2008实验结果
  表2 NIST SRE 2012实验结果(EER%/MinC12)
[1] 田垚, 蔡猛, 何亮, 刘加. 基于深度神经网络和Bottleneck特征的说话人识别系统[J]. 清华大学学报(自然科学版), 2016, 56(11): 1143-1148.
