Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  百年期刊
Journal of Tsinghua University(Science and Technology)    2017, Vol. 57 Issue (1) : 79-83     DOI: 10.16511/j.cnki.qhdxxb.2017.21.015
ELECTRONIC ENGINEERING |
Overlapping speech detection using high-level information features
MA Yong1,2, BAO Changchun1
1. School of Electronic Information and Control Engineering, Beijing University of Technology, Beijing 100124, China;
2. School of Physics and Electronic Engineering, Jiangsu Normal University, Xuzhou 221009, China
Download: PDF(1217 KB)  
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks    
Abstract  Overlapping speech is one of the main factors influencing the performance of speaker segmentation. This paper presents an overlapping speech detection method using a high-level information feature to improve the speaker segmentation results. A linguistic high-level information feature of the speech is extracted using the universal background model (UBM). Then, a hidden Markov model (HMM) is trained using the Mel frequency cepstral coefficients (MFCC) and the high-level information to detect overlapping speech. The result is then used for the speaker segmentation of the pre-processed speech. Tests on a dataset generated from the TIMIT database show that the error ratio for overlapping speech detection is significantly lower than the reference method using just the MFCC feature. The speaker segmentation is also significantly improved.
Keywords overlapping speech detection      high-level information feature      speaker segmentation     
ZTFLH:  TN912.3  
Issue Date: 15 January 2017
Service
E-mail this article
E-mail Alert
RSS
Articles by authors
MA Yong
BAO Changchun
Cite this article:   
MA Yong,BAO Changchun. Overlapping speech detection using high-level information features[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(1): 79-83.
URL:  
http://jst.tsinghuajournals.com/EN/10.16511/j.cnki.qhdxxb.2017.21.015     OR     http://jst.tsinghuajournals.com/EN/Y2017/V57/I1/79
  
  
  
  
  
  
  
[1] Shriberg E, Stolcker A, Baron D. Observations on overlap:Finding and implications for automatic processing of multi-party conversation[C]//Proc 7th European Conference on Speech Communication and Technology. Aalborg, Denmark:ISCA, 2001:1359-1362.
[2] Sinclair M, King S. Where are the challenges in speaker diarization[C]//Proc International Conference on Acoustics, Speech, Signal and Signal Processing. Vancouver, Canada:IEEE, 2013:7741-7745.
[3] 马勇, 鲍长春. 说话人分割聚类研究进展[J]. 信号处理, 2013, 29(9):1190-1199.MA Yong, BAO Changchun. Advance in speaker segmentation and clustering[J]. Journal of Signal Processing, 2013, 29(9):1190-1199. (in Chinese).
[4] Kotti M, Moschou V, Kotropoulos C. Speaker segmentation and clustering[J]. Signal Processing, 2008. 88(5):1091-1124.
[5] Otterson S, Ostendorf M. Efficient use of overlap information in speaker diarization[C]//Proc Conference Automatic Speech Recognition & Understanding, Kyoto, Japan:IEEE, 2007:683-686.
[6] Roakye K, Hornero B, Vinyals O, et al. Overlapped speech detection for improved diarization in multi-party meetings[C]//Proc International Conference on Acoustics, Speech, Signal and Signal Processing. Las Vegas, NV, USA:IEEE, 2008:4353-4356.
[7] Roakye K, Vinyals O, Friedland G. Improved overlapped speech handling for speaker diarization[C]//Proc International Speech Communication Association. Florence, Italy:ISCA, 2011:941-944.
[8] Zelenak M, Segura C, Luque J, et al, Simultaneous speech detection with spatial features for speaker diarization[J]. IEEE Transaction on Audio, Speech and Language Processing, 2012, 20(2):436-446.
[9] Geiger J T, Eyben F, Evans N, et al. Using linguistic information to detect overlapping speech[C]//Proc International Speech Communication Association. Lyon, France:ISCA, 2013:941-944.
[10] Yella S H, Bourlard H. Overlapping speech detection using long-term conversational features for speaker diarization in meeting room conversations[J]. IEEE Transaction on Audio, Speech and Language Processing, 2014, 22(12):1688-1700.
[11] Reynolds D, Quatieri T, Dunn R. Speaker verification using adapted Gaussian mixture models[J]. Digital Signal Processing, 2000, 10(1):19-41.
[12] Delacourt P, Wellekens C J. DISTBIC:A speaker-based segmentation for audio data indexing[J]. Speech Communication, 2000, 32(1):111-126.
[1] ZHANG Xueying, NIU Puhua, GAO Fan. DNN-LSTM based VAD algorithm[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(5): 509-515.
[2] FANG Yi, CHEN Youyuan, MOU Hongyu, FENG Haihong. A robust time-delay estimation and dereverberation algorithm based on the coherence function[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(5): 516-522.
[3] SONG Peng, ZHENG Wenming, ZHAO Li. Joint subspace learning and feature selection method for speech emotion recognition[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(4): 347-351.
[4] LU Wenhuan, FENG Xiaoyan, HONDA Kiyoshi, WEI Jianguo. MRI analyses of the effects of relative tongue size on individual articulatory differences[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(4): 357-361.
[5] ZHANG Yu, ZHANG Pengyuan, YAN Yonghong. Long short-term memory with attention and multitask learning for distant speech recognition[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(3): 249-253.
[6] MIAO Xiaoxiao, ZHANG Jian, SUO Hongbin, ZHOU Ruohua, YAN Yonghong. Expanding the length of short utterances for short-duration language recognition[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(3): 254-259.
[7] Gulmire Imam, Guljamal Mamateli, Maynur Ablitip, Askar Hamdulla. Prosody modeling for Uyghur TTS[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(12): 1259-1264.
[8] ZHANG Jian, XU Jie, BAO Xiuguo, ZHOU Ruohua, YAN Yonghong. Weighted phone log-likelihood ratio feature for spoken language recognition[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(10): 1038-1041,1047.
[9] WANG Jianrong, GAO Yongchun, ZHANG Ju, WEI Jianguo, DANG Jianwu. Automatic speech recognition by a Kinect sensor for a robot under ego noises[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(9): 921-925.
[10] LIANG Weiqian, ZHENG Fang, CHEN Chaoyang, CHEN Gaojun. GSPAP based sub-band adaptive feedback cancellation algorithm[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(7): 707-712.
[11] SUN Jiasong, ZHANG Jingyun, YANG Yi. Effective audio fingerprint retrieval based on the spectral sub-band centroid feature[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(4): 382-387.
[12] GUO Wu, ZHANG Sheng, XU Jie, HU Guoping, MA Xiaokong. Speaker verification based on SVM and total variability[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(3): 240-243.
[13] GAN Zhenye, CHEN Hao, YANG Hongwu. Speech enhancement algorithm that combines EEMD and K-SVD dictionary training[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(3): 286-292.
[14] ZHANG Pengyuan, JI Zhe, HOU Wei, JIN Xin, HAN Weisheng. Design and optimization of a low resource speech recognition system[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(2): 147-152.
[15] WANG Jianrong, ZHANG Ju, LU Wenhuan, WEI Jianguo, DANG Jianwu. Automatic speech recognition with robot noise[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(2): 153-157.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
Copyright © Journal of Tsinghua University(Science and Technology), All Rights Reserved.
Powered by Beijing Magtech Co. Ltd