Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  百年期刊
Journal of Tsinghua University(Science and Technology)    2017, Vol. 57 Issue (1) : 84-88     DOI: 10.16511/j.cnki.qhdxxb.2017.21.016
ELECTRONIC ENGINEERING |
Single-channel speech separation with non-negative matrix factorization and factorial conditional random fields
LI Xu1, TU Ming2, WU Chao1, GUO Yanmeng1, NA Yueyue1, FU Qiang1, YAN Yonghong1
1. Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China;
2. Signal Analysis Representation and Perception Laboratory, Arizona State University, Tempe 85281, USA
Download: PDF(1055 KB)  
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks    
Abstract  Non-negative matrix factorization (NMF) has been extensively used for single channel speech separation. However, a typical issue with the standard NMF based methods is that they assume the independency of each time frame of the speech signal and, thus, cannot model the temporal continuity of the speech signal. This paper presents an algorithm for single-channel speech separation based on NMF and the factorial conditional random field (FCRF) method. A model is developed by combining NMF with the k-means clustering method. This model can concurrently describe the spectral structure and the temporal continuity of the speech signal. Then, the model is used to train the FCRF model, which is used to separate the mixed speech signal. Tests show that this algorithm consistently improves the separation performance compared with the active-set Newton algorithm, an NMF based approach that dose not consider the temporal dynamics of the speech signal.
Keywords single-channel speech separation      factorial conditional random field (FCRF)      non-negative matrix factorization (NMF)      k-means clustering     
ZTFLH:  TN912.3  
Issue Date: 15 January 2017
Service
E-mail this article
E-mail Alert
RSS
Articles by authors
LI Xu
TU Ming
WU Chao
GUO Yanmeng
NA Yueyue
FU Qiang
YAN Yonghong
Cite this article:   
LI Xu,TU Ming,WU Chao, et al. Single-channel speech separation with non-negative matrix factorization and factorial conditional random fields[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(1): 84-88.
URL:  
http://jst.tsinghuajournals.com/EN/10.16511/j.cnki.qhdxxb.2017.21.016     OR     http://jst.tsinghuajournals.com/EN/Y2017/V57/I1/84
  
  
  
[1] Roweis S T. One microphone source separation[C]//NIPS. Vancouver, Canada:MIT Press, 2010:793-799.
[2] Lee D D, Seung H S. Learning the parts of objects by non-negative matrix factorization[J]. Nature, 1999, 401(6755):788-791.
[3] Brown G J, Cooke M. Computational auditory scene analysis[J]. Computer Speech & Language, 1994, 8(4):297-336.
url: http://dx.doi.org/ter Speech
[4] Lee D D, Seung H S. Algorithms for non-negative matrix factorization[J]. Advances in Neural Information Processing Systems, 2015, 13(6):556-562.
[5] Virtanen T, Gemmeke J F, Raj B. Active-set Newton algorithm for overcomplete non-negative representations of audio[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(11):2277-2289.
[6] Virtanen T, Raj B, Gemmeke J F, et al. Active-set Newton algorithm for non-negative sparse coding of audio[C]//IEEE International Conference on Acoustics, Speech and Signal Processing. Florence, Italy:IEEE Press, 2014:3092-3096.
[7] Virtanen T. Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2007, 15(3):1066-1074.
[8] Wilson K W, Raj B, Smaragdis P. Regularized non-negative matrix factorization with temporal dependencies for speech denoising[C]//Interspeech. Brisbane, Australia:ISCA, 2008:411-414.
[9] Mohammadiha N, Smaragdis P, Leijon A. Prediction based filtering and smoothing to exploit temporal dependencies in NMF[C]//IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada:IEEE. Press, 2013:873-877.
[10] Mysore G J, Smaragdis P, Raj B. Non-negative hidden Markov modeling of audio with application to source separation[C]//International Conference on Latent Variable Analysis and Signal Separation. Malo, France:Springer, 2010:140-148.
[11] Sutton C, McCallum A, Rohanimanesh K. Dynamic conditional random fields:Factorized probabilistic models for labeling and segmenting sequence data[C]//ICML. Alberta, Canada:ACM, 2004:693-723.
[12] Carabias-Orti J J, Rodriguez-Serrano F J, Vera-Candeas P, et al. Constrained non-negative sparse coding using learnt instrument templates for realtime music transcription[J]. Engineering Applications of Artifical Intelligence, 2013, 26(7):1671-1680.
[13] Yeung Y T, Lee T, Leung Cheung-Chi. Using dynamic conditional random field on single-microphone speech separation[C]//IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada:IEEE Press, 2013:146-150.
[14] Vincent E, Fevotte C, Gribonval R. Performance measurement in blind audio source separation[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2006, 14(4):1462-1469.
[15] Cooke M, Barker J, Cunningham S, et al. An audio-visual corpus for speech perception and automatic speech recognition[J]. The Journal of the Acoustical Society of America, 2006, 120(5):2421-2424.
[1] MEI Hua, DU Yupeng, WANG Zhenlei, QIAN Feng. Naphtha characterization based on a molecular-type homologous series vector representation[J]. Journal of Tsinghua University(Science and Technology), 2016, 56(7): 723-727.
[2] CHEN Yuanlin, CHAI Yueting, LIU Yi, XU Yang. Transaction rating credibility based on user group preference[J]. Journal of Tsinghua University(Science and Technology), 2015, 55(5): 558-564,571.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
Copyright © Journal of Tsinghua University(Science and Technology), All Rights Reserved.
Powered by Beijing Magtech Co. Ltd