Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们 横山亮次奖 百年刊庆
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  横山亮次奖  |  百年刊庆
清华大学学报(自然科学版)  2019, Vol. 59 Issue (6): 476-481    DOI: 10.16511/j.cnki.qhdxxb.2019.21.011
  计算机科学与技术 本期目录 | 过刊浏览 | 高级检索 |
基于声学状态似然值得分模型及监督状态模型的语音识别特征融合算法
肖熙, 徐晨
清华大学 电子工程系, 北京 100084
Speech feature fusion algorithm based on acoustic state likelihood and supervised state modelling
XIAO Xi, XU Chen
Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
全文: PDF(1555 KB)  
输出: BibTeX | EndNote (RIS)      
摘要 语音识别GMM-HMM(Gaussian mixture model-hidden Markov model)在使用最大似然状态序列(most likely state sequence,MLSS)准则得到观测量的最佳状态序列时,只考虑了具有语音帧最大似然值的状态信息,而忽略了其他次优状态对当前帧的影响,造成信息的丢失,从而降低了系统识别率。为更好地利用声学状态的似然值信息,该文提出了声学状态似然值得分模型和监督状态模型,并基于以上模型得到了状态似然聚类特征(state likelihood cluster feature,SLCF)、监督状态特征(supervised state feature,SSF)。这2种特征反映了MFCC(Mel frequency cepstrum coefficient)声学特征关于HMM状态的一种信息。实验表明,将SLCF、SSF分别与MFCC融合,新的特征可提高语音识别效果。融合了SLCF、SSF后,与GMM-HMM只使用MFCC相比,孤立字识别系统的总错误率分别相对下降了6.10%、9.66%,连续语音识别系统的总错误率分别相对下降了2.53%、11.05%。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
肖熙
徐晨
关键词 监督状态特征声学特征聚类状态似然聚类特征    
Abstract:A Gaussian mixture model-hidden Markov model (GMM-HMM) for speech recognition uses the most likely state sequence (MLSS) criterion to get the best state series of observations. Since the MLSS search algorithm only considers the maximum likelihood state of speech frame, the effects of other suboptimal states are neglected and some important information is lost, which reduces the system recognition rate. Acoustic state likelihood modelling and supervised state modelling are used here to better utilize the acoustic state likelihood information. A state likelihood cluster feature and a supervised state feature are used to calculate the state likelihood of the acoustic feature Mel frequency cepstrum coefficient (MFCC). Tests show that these three features improve the speech recognition accuracy. The state likelihood cluster and supervised state feature reduce the relative error rate by 6.10% and 9.66% for isolated word recognition compared to GMM-HMM using only MFCC and by 2.53% and 11.05% for continuous speech recognition.
Key wordssupervised state feature    acoustic feature clustering    state likelihood cluster feature
收稿日期: 2018-12-07      出版日期: 2019-06-01
引用本文:   
肖熙, 徐晨. 基于声学状态似然值得分模型及监督状态模型的语音识别特征融合算法[J]. 清华大学学报(自然科学版), 2019, 59(6): 476-481.
XIAO Xi, XU Chen. Speech feature fusion algorithm based on acoustic state likelihood and supervised state modelling. Journal of Tsinghua University(Science and Technology), 2019, 59(6): 476-481.
链接本文:  
http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2019.21.011  或          http://jst.tsinghuajournals.com/CN/Y2019/V59/I6/476
  图1 汉语三音子模型(考虑协同发音状态)
  图2 状态似然值得分模型的训练流程图
  图3 融合状态似然值得分模型、 监督状态模型的 GMMGHMM 语音识别模型
  图4 新模型的码本训练流程图
  图5 (网络版彩图))孤立字模型 MFCC/SLCF/SLCF+ MFCC下50个孤立字男声语音样本的识别效果
  表1 不同特征在孤立字语音识别系统下的实验结果
  表2 不同特征在连续语音识别系统下的实验结果
[1] BAKER J M, LI D, GLASS J, et al. Developments and directions in speech recognition and understanding, Part 1[DSP Education] [J]. IEEE Signal Processing Magazine, 2009, 26(3):75-80.
[2] RABINER L R. A tutorial on hidden Markov models and selected applications in speech recognition[J]. Readings in Speech Recognition, 1990, 77(2):267-296.
[3] FURUI S. Digital speech processing, synthesis, and recognition[M]. New York:Marcel Dekker, 2000.
[4] YU D, LI D, SEIDE F. The deep tensor neural network with applications to large vocabulary speech recognition[J]. IEEE Transactions on Audio Speech and Language Processing, 2013, 21(2):388-396.
[5] 欧智坚, 王作英. 从线性预测HMM到一种新的语音识别的混合模型[J]. 电子学报, 2002, 30(9):1313-1316.OU Z J, WANG Z Y. A hybrid model from linear prediction HMM to a new speech recognition[J]. Chinese Journal of Electronics, 2002, 30(9):1313-1316. (in Chinese)
[6] PANG Z H, TU S K, SU D, et al. Discriminative training of GMM-HMM acoustic model by RPCL learning[J]. Frontiers of Electrical and Electronic Engineering, 2011, 6(2):283-290.
[7] REYNOLDS D A, ROSE R C. Robust text-independent speaker identification using Gaussian mixture speaker models[J]. IEEE Transactions on Speech & Audio Processing, 1995, 3(1):72-83.
[8] HERMANSKY H, ELLIS D, SHARMA S. Tandem connectionist feature extraction for conventional HMM systems[C]//2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100). Istanbul, Turkey:IEEE, 2000, 1635-1638.
[9] HAEBUMBACH R, NEY H. Linear discriminant analysis for improved large vocabulary continuous speech recognition[C]//IEEE International Conference on Acoustics. San Francisco, USA:IEEE, 1992, 13-16.
[10] 李春, 王作英. 基于语音学分类的三音子识别单元的研究[C]//第六届全国人机语音通讯学术会议论文集.深圳:中国中文信息学会, 2001, 257-262.LI C, WANG Z Y. Triphone recognition unit based on phonetics category[C]//The 6th National Conference of Human Computer Speech Communication. Shenzhen, China:CIPSC, 2001, 257-262. (in Chinese)
[11] 游展. DDBHMM语音识别段长模型的研究和改进[D]. 北京:清华大学, 2008.YOU Z. The research and improvement on DDBHMM speech recognition model[D]. Beijing:Tsinghua University, 2008. (in Chinese)
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 《清华大学学报(自然科学版)》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn