Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们 横山亮次奖 百年刊庆
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  横山亮次奖  |  百年刊庆
清华大学学报(自然科学版)  2017, Vol. 57 Issue (1): 84-88    DOI: 10.16511/j.cnki.qhdxxb.2017.21.016
  电子工程 本期目录 | 过刊浏览 | 高级检索 |
基于NMF和FCRF的单通道语音分离
李煦1, 屠明2, 吴超1, 国雁萌1, 纳跃跃1, 付强1, 颜永红1
1. 中国科学院 声学研究所, 语言声学与内容理解重点实验室, 北京 100190, 中国;
2. 亚利桑那州立大学, 信号分析与感知实验室, 坦佩 85281, 美国
Single-channel speech separation with non-negative matrix factorization and factorial conditional random fields
LI Xu1, TU Ming2, WU Chao1, GUO Yanmeng1, NA Yueyue1, FU Qiang1, YAN Yonghong1
1. Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China;
2. Signal Analysis Representation and Perception Laboratory, Arizona State University, Tempe 85281, USA
全文: PDF(1055 KB)  
输出: BibTeX | EndNote (RIS)      
摘要 近年来,非负矩阵分解(non-negative matrix factorization,NMF)被广泛应用于单通道语音分离问题。然而,标准的NMF算法假设语音的相邻帧之间是相互独立的,不能表征语音信号的时间连续性信息。为此,该文提出了一种基于NMF和因子条件随机场(factorial conditional random field,FCRF)的语音分离算法,首先将NMF和k均值聚类结合对纯净语音的频谱结构以及时间连续性进行建模,然后利用得到的模型训练FCRF模型,进而对混合语音信号进行分离。结果表明:该算法相比没有考虑语音时间连续特性的基于NMF的算法如激活集牛顿算法(active-set Newton algorithm,ASNA),在客观指标上有明显提高。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
李煦
屠明
吴超
国雁萌
纳跃跃
付强
颜永红
关键词 单通道语音分离因子条件随机场非负矩阵分解k均值聚类    
Abstract:Non-negative matrix factorization (NMF) has been extensively used for single channel speech separation. However, a typical issue with the standard NMF based methods is that they assume the independency of each time frame of the speech signal and, thus, cannot model the temporal continuity of the speech signal. This paper presents an algorithm for single-channel speech separation based on NMF and the factorial conditional random field (FCRF) method. A model is developed by combining NMF with the k-means clustering method. This model can concurrently describe the spectral structure and the temporal continuity of the speech signal. Then, the model is used to train the FCRF model, which is used to separate the mixed speech signal. Tests show that this algorithm consistently improves the separation performance compared with the active-set Newton algorithm, an NMF based approach that dose not consider the temporal dynamics of the speech signal.
Key wordssingle-channel speech separation    factorial conditional random field (FCRF)    non-negative matrix factorization (NMF)    k-means clustering
收稿日期: 2015-07-10      出版日期: 2017-01-20
ZTFLH:  TN912.3  
通讯作者: 付强,研究员,E-mail:qfu@hccl.ioa.ac.cn     E-mail: qfu@hccl.ioa.ac.cn
引用本文:   
李煦, 屠明, 吴超, 国雁萌, 纳跃跃, 付强, 颜永红. 基于NMF和FCRF的单通道语音分离[J]. 清华大学学报(自然科学版), 2017, 57(1): 84-88.
LI Xu, TU Ming, WU Chao, GUO Yanmeng, NA Yueyue, FU Qiang, YAN Yonghong. Single-channel speech separation with non-negative matrix factorization and factorial conditional random fields. Journal of Tsinghua University(Science and Technology), 2017, 57(1): 84-88.
链接本文:  
http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2017.21.016  或          http://jst.tsinghuajournals.com/CN/Y2017/V57/I1/84
  图1 FCRF图模型
  图2 语音分离系统框图
  表1 平均的SDR、SIR 和SAR 结果
[1] Roweis S T. One microphone source separation[C]//NIPS. Vancouver, Canada:MIT Press, 2010:793-799.
[2] Lee D D, Seung H S. Learning the parts of objects by non-negative matrix factorization[J]. Nature, 1999, 401(6755):788-791.
[3] Brown G J, Cooke M. Computational auditory scene analysis[J]. Computer Speech & Language, 1994, 8(4):297-336.
[4] Lee D D, Seung H S. Algorithms for non-negative matrix factorization[J]. Advances in Neural Information Processing Systems, 2015, 13(6):556-562.
[5] Virtanen T, Gemmeke J F, Raj B. Active-set Newton algorithm for overcomplete non-negative representations of audio[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(11):2277-2289.
[6] Virtanen T, Raj B, Gemmeke J F, et al. Active-set Newton algorithm for non-negative sparse coding of audio[C]//IEEE International Conference on Acoustics, Speech and Signal Processing. Florence, Italy:IEEE Press, 2014:3092-3096.
[7] Virtanen T. Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2007, 15(3):1066-1074.
[8] Wilson K W, Raj B, Smaragdis P. Regularized non-negative matrix factorization with temporal dependencies for speech denoising[C]//Interspeech. Brisbane, Australia:ISCA, 2008:411-414.
[9] Mohammadiha N, Smaragdis P, Leijon A. Prediction based filtering and smoothing to exploit temporal dependencies in NMF[C]//IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada:IEEE. Press, 2013:873-877.
[10] Mysore G J, Smaragdis P, Raj B. Non-negative hidden Markov modeling of audio with application to source separation[C]//International Conference on Latent Variable Analysis and Signal Separation. Malo, France:Springer, 2010:140-148.
[11] Sutton C, McCallum A, Rohanimanesh K. Dynamic conditional random fields:Factorized probabilistic models for labeling and segmenting sequence data[C]//ICML. Alberta, Canada:ACM, 2004:693-723.
[12] Carabias-Orti J J, Rodriguez-Serrano F J, Vera-Candeas P, et al. Constrained non-negative sparse coding using learnt instrument templates for realtime music transcription[J]. Engineering Applications of Artifical Intelligence, 2013, 26(7):1671-1680.
[13] Yeung Y T, Lee T, Leung Cheung-Chi. Using dynamic conditional random field on single-microphone speech separation[C]//IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada:IEEE Press, 2013:146-150.
[14] Vincent E, Fevotte C, Gribonval R. Performance measurement in blind audio source separation[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2006, 14(4):1462-1469.
[15] Cooke M, Barker J, Cunningham S, et al. An audio-visual corpus for speech perception and automatic speech recognition[J]. The Journal of the Acoustical Society of America, 2006, 120(5):2421-2424.
[1] 梅华, 杜玉鹏, 王振雷, 钱锋. 基于分子同系物向量表示的石脑油特征提取方法[J]. 清华大学学报(自然科学版), 2016, 56(7): 723-727.
[2] 郭武, 马啸空. 复杂噪声场景下的活动语音检测方法[J]. 清华大学学报(自然科学版), 2016, 56(11): 1190-1195.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 《清华大学学报(自然科学版)》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn