基于高层信息特征的重叠语音检测

doi:10.16511/j.cnki.qhdxxb.2017.21.015

摘要
图/表
参考文献
相关文章
Metrics

全文: PDF(1217 KB)
输出: BibTeX | EndNote (RIS)

摘要重叠语音是影响说话人分割性能的主要因素之一。该文提出了基于语音高层信息特征的重叠语音检测方法以提高说话人分割效果。首先用通用背景模型（universal background model，UBM）提取语音的语言学高层信息特征，并融合这些特征和Mel频率倒谱系数（Mel frequency cepstral coefficient，MFCC）特征建立隐Markov模型（hidden Markov model，HMM）检测重叠语音，然后对处理后的语音进行说话人分割。实验结果表明：对于由TIMIT语音库生成的数据集，该方法对重叠语音检测的错误率比单一采用MFCC特征有显著降低，而且说话人分割性能有明显的提高。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS

	作者相关文章
	马勇
	鲍长春

关键词 ：重叠语音检测, 高层信息特征, 说话人分割

Abstract：Overlapping speech is one of the main factors influencing the performance of speaker segmentation. This paper presents an overlapping speech detection method using a high-level information feature to improve the speaker segmentation results. A linguistic high-level information feature of the speech is extracted using the universal background model (UBM). Then, a hidden Markov model (HMM) is trained using the Mel frequency cepstral coefficients (MFCC) and the high-level information to detect overlapping speech. The result is then used for the speaker segmentation of the pre-processed speech. Tests on a dataset generated from the TIMIT database show that the error ratio for overlapping speech detection is significantly lower than the reference method using just the MFCC feature. The speaker segmentation is also significantly improved.

Key words： overlapping speech detection high-level information feature speaker segmentation

收稿日期: 2016-06-18 出版日期: 2017-01-15

ZTFLH:

TN912.3

通讯作者: 鲍长春,教授,E-mail:baochch@bjut.edu.cn E-mail: baochch@bjut.edu.cn

引用本文:

马勇, 鲍长春. 基于高层信息特征的重叠语音检测[J]. 清华大学学报（自然科学版）, 2017, 57(1): 79-83.
MA Yong, BAO Changchun. Overlapping speech detection using high-level information features. Journal of Tsinghua University(Science and Technology), 2017, 57(1): 79-83.

链接本文:

http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2017.21.015 或 http://jst.tsinghuajournals.com/CN/Y2017/V57/I1/79

图1 不同数目说话人重叠语音r_hops的值

图2 不同数目说话人重叠语音r_ls的值

图3 基于帧的类音素符号转换率提取

图4 重叠语音和非重叠语音的高层信息特征对比

图5 重叠语音检测的原理框图

图6 四种特征的重叠语音检测性能对比

表1 重叠语音检测对说话人分割性能的影响

[1]	Shriberg E, Stolcker A, Baron D. Observations on overlap:Finding and implications for automatic processing of multi-party conversation[C]//Proc 7th European Conference on Speech Communication and Technology. Aalborg, Denmark:ISCA, 2001:1359-1362.
[2]	Sinclair M, King S. Where are the challenges in speaker diarization[C]//Proc International Conference on Acoustics, Speech, Signal and Signal Processing. Vancouver, Canada:IEEE, 2013:7741-7745.
[3]	马勇, 鲍长春. 说话人分割聚类研究进展[J]. 信号处理, 2013, 29(9):1190-1199.MA Yong, BAO Changchun. Advance in speaker segmentation and clustering[J]. Journal of Signal Processing, 2013, 29(9):1190-1199. (in Chinese).
[4]	Kotti M, Moschou V, Kotropoulos C. Speaker segmentation and clustering[J]. Signal Processing, 2008. 88(5):1091-1124.
[5]	Otterson S, Ostendorf M. Efficient use of overlap information in speaker diarization[C]//Proc Conference Automatic Speech Recognition & Understanding, Kyoto, Japan:IEEE, 2007:683-686.
[6]	Roakye K, Hornero B, Vinyals O, et al. Overlapped speech detection for improved diarization in multi-party meetings[C]//Proc International Conference on Acoustics, Speech, Signal and Signal Processing. Las Vegas, NV, USA:IEEE, 2008:4353-4356.
[7]	Roakye K, Vinyals O, Friedland G. Improved overlapped speech handling for speaker diarization[C]//Proc International Speech Communication Association. Florence, Italy:ISCA, 2011:941-944.
[8]	Zelenak M, Segura C, Luque J, et al, Simultaneous speech detection with spatial features for speaker diarization[J]. IEEE Transaction on Audio, Speech and Language Processing, 2012, 20(2):436-446.
[9]	Geiger J T, Eyben F, Evans N, et al. Using linguistic information to detect overlapping speech[C]//Proc International Speech Communication Association. Lyon, France:ISCA, 2013:941-944.
[10]	Yella S H, Bourlard H. Overlapping speech detection using long-term conversational features for speaker diarization in meeting room conversations[J]. IEEE Transaction on Audio, Speech and Language Processing, 2014, 22(12):1688-1700.
[11]	Reynolds D, Quatieri T, Dunn R. Speaker verification using adapted Gaussian mixture models[J]. Digital Signal Processing, 2000, 10(1):19-41.
[12]	Delacourt P, Wellekens C J. DISTBIC:A speaker-based segmentation for audio data indexing[J]. Speech Communication, 2000, 32(1):111-126.

[1]	张雪英, 牛溥华, 高帆. 基于DNN-LSTM的VAD算法[J]. 清华大学学报（自然科学版）, 2018, 58(5): 509-515.
[2]	方义, 陈友元, 牟宏宇, 冯海泓. 基于双耳相干函数的鲁棒时延差估计与混响抑制算法[J]. 清华大学学报（自然科学版）, 2018, 58(5): 516-522.
[3]	宋鹏, 郑文明, 赵力. 基于子空间学习和特征选择融合的语音情感识别[J]. 清华大学学报（自然科学版）, 2018, 58(4): 347-351.
[4]	路文焕, 冯晓艳, HONDA Kiyoshi, 魏建国. 基于MRI研究相对舌体大小对个性化发音的影响[J]. 清华大学学报（自然科学版）, 2018, 58(4): 357-361.
[5]	张宇, 张鹏远, 颜永红. 基于注意力LSTM和多任务学习的远场语音识别[J]. 清华大学学报（自然科学版）, 2018, 58(3): 249-253.
[6]	苗晓晓, 张健, 索宏彬, 周若华, 颜永红. 应用于短时语音语种识别的时长扩展方法[J]. 清华大学学报（自然科学版）, 2018, 58(3): 254-259.
[7]	古力米热·依玛木, 姑丽加玛丽·麦麦提艾力, 玛依努尔·阿吾力提甫, 艾斯卡尔·艾木都拉. 维吾尔语韵律建模[J]. 清华大学学报（自然科学版）, 2017, 57(12): 1259-1264.
[8]	张健, 徐杰, 包秀国, 周若华, 颜永红. 应用于语种识别的加权音素对数似然比特征[J]. 清华大学学报（自然科学版）, 2017, 57(10): 1038-1041,1047.
[9]	王建荣, 高永春, 张句, 魏建国, 党建武. 基于Kinect辅助的机器人带噪语音识别[J]. 清华大学学报（自然科学版）, 2017, 57(9): 921-925.
[10]	梁维谦, 郑方, 陈朝阳, 陈高鋆. 基于GSPAP的子带自适应声反馈消除算法[J]. 清华大学学报（自然科学版）, 2017, 57(7): 707-712.
[11]	孙甲松, 张菁芸, 杨毅. 基于子带频谱质心特征的高效音频指纹检索[J]. 清华大学学报（自然科学版）, 2017, 57(4): 382-387.
[12]	郭武, 张圣, 徐杰, 胡国平, 马啸空. 全变量系统和支持向量机结合的说话人确认[J]. 清华大学学报（自然科学版）, 2017, 57(3): 240-243.
[13]	甘振业, 陈浩, 杨鸿武. 结合EEMD与K-SVD字典训练的语音增强算法[J]. 清华大学学报（自然科学版）, 2017, 57(3): 286-292.
[14]	张鹏远, 计哲, 侯炜, 金鑫, 韩卫生. 小资源下语音识别算法设计与优化[J]. 清华大学学报（自然科学版）, 2017, 57(2): 147-152.
[15]	王建荣, 张句, 路文焕, 魏建国, 党建武. 机器人自身噪声环境下的自动语音识别[J]. 清华大学学报（自然科学版）, 2017, 57(2): 153-157.

Viewed

Full text

Abstract

Cited

Shared

Discussed