Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们 横山亮次奖 百年刊庆
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  横山亮次奖  |  百年刊庆
清华大学学报(自然科学版)  2017, Vol. 57 Issue (8): 857-861    DOI: 10.16511/j.cnki.qhdxxb.2017.22.050
  电子工程 本期目录 | 过刊浏览 | 高级检索 |
基于k均值和基于归一化类内方差的语音识别自适应聚类特征提取算法
肖熙, 周路
清华大学 电子工程系, 北京 100084
Speech recognition adaptive clustering feature extraction algorithms based on the k-means algorithm and the normalized intra-class variance
XIAO Xi, ZHOU Lu
Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
全文: PDF(1029 KB)  
输出: BibTeX | EndNote (RIS)      
摘要 语音识别模型中帧间独立假设在给模型计算带来简洁的同时,不可避免地降低了模型精度,增加了识别错误。该文旨在寻找一种既能满足帧间独立假设又能保持语音信息的特征。分别提出了基于k均值和基于归一化类内方差的语音识别自适应聚类特征提取算法,可以自适应地实现聚类特征流的提取。将该自适应特征分别应用在Gauss混合模型-隐Markov模型、基于段长分布的隐Markov模型和上下文相关的深度神经网络模型这3种语音识别模型中,与基线系统进行了实验对比。结果表明:采用基于归一化类内方差的自适应特征可以使得3种语言模型的识别错误率分别相对下降10.53%、5.17%和2.65%,展示了语音自适应聚类特征的良好性能。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
肖熙
周路
关键词 特征提取自适应聚类特征帧间独立假设归一化类内方差    
Abstract:The inter-frame independence assumption for speech recognition simplifies the computations. However, it also reduces the model accuracy and can easily give rise to recognition errors. Therefore, the objective of this paper is to search for a feature which can weaken the inter-frame dependence of the speech features and keep as much information of the original speech as possible. Two speech recognition feature extraction algorithms are given based on the k-means algorithm and the normalized intra-class variance. These algorithms provide adaptive clustering feature extraction. Speech recognition tests with these algorithms on a Gaussian mixture model-hidden Markov model (GMM-HMM), a duration distribution based HMM (DDBHMM), and a context dependent deep neural network HMM (CD-DNN-HMM) show that the adaptive feature based on the normalized intra-class variance reduces the relative recognition error rates by 10.53%, 5.17%, and 2.65% relative to the original features. Thus, this adaptive clustering feature extraction algorithm provides improved speech recognition.
Key wordsfeature extraction    adaptive clustering feature    assumption of inter-frame independence    normalized intra-class variance
收稿日期: 2017-02-13      出版日期: 2017-08-15
ZTFLH:  TP391.4  
引用本文:   
肖熙, 周路. 基于k均值和基于归一化类内方差的语音识别自适应聚类特征提取算法[J]. 清华大学学报(自然科学版), 2017, 57(8): 857-861.
XIAO Xi, ZHOU Lu. Speech recognition adaptive clustering feature extraction algorithms based on the k-means algorithm and the normalized intra-class variance. Journal of Tsinghua University(Science and Technology), 2017, 57(8): 857-861.
链接本文:  
http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2017.22.050  或          http://jst.tsinghuajournals.com/CN/Y2017/V57/I8/857
  图1 三音子模型识别示意图[12]
  图2 基于KG均值的特征提取算法
  图3 归一化类内方差的分布
  图4 改进算法的归一化类内方差与平均相对压缩率的关系
  表1 3种特征计算似然加法和乘法运算量
  表2 3种模型采用不同特征的连续语音识别总错误率
[1] Rabiner L R. A tutorial on hidden Markov models and selected applications in speech recognition[J]. Readings in Speech Recognition, 1990, 77(2):267-296.
[2] Reynolds D A, Rose R C. Robust text-independent speaker identification using Gaussian mixture speaker models[J]. IEEE Transactions on Speech & Audio Processing, 1995, 3(1):72-83.
[3] Sainath T N, Kingsbury B, Saon G, et al. Deep convolutional neural networks for large-scale speech tasks[J]. Neural Networks the Official Journal of the International Neural Network Society, 2015, 64:39-48.
[4] Ansari Z, Seyyedsalehi S A. Toward growing modular deep neural networks for continuous speech recognition[J/OL]. Neural Computing & Applications, 2016:1-20. DOI:10.1007/s00521-016-2438-x.
[5] Hinton G, Deng L, Yu D, et al. Deep neural networks for acoustic modeling in speech recognition:The shared views of four research groups[J]. IEEE Signal Processing Magazine, 2012, 29(6):82-97.
[6] Yu D, Deng L, Seide F. The deep tensor neural network with applications to large vocabulary speech recognition[J]. IEEE Transactions on Audio Speech & Language Processing, 2013, 21(2):388-396.
[7] Hinton G, Osindero S, Teh Y. A fast learning algorithm for deep belief nets[J]. Neural Computation, 1989, 18(7):1527-1554.
[8] Yu D, Seltzer M L. Improved bottleneck features using pretrained deep neural networks[C]//INTERSPEECH 2011, Conference of the International Speech Communication Association. Florence, Italy, 2011:237-240.
[9] Dahl G E, Yu D, Deng L, et al. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition[J]. IEEE Transactions on Audio Speech & Language Processing, 2012, 20(1):30-42.
[10] Rabiner L R. A tutorial on hidden Markov models and selected applications in speech recognition[J]. Readings in Speech Recognition, 1990, 77(2):267-296.
[11] 肖熙. DDBHMM语音识别模型的训练和识别算法[D]. 北京:清华大学, 2003.XIAO Xi. The Training and Recognition Algorithm for DDBHMM Speech Recognition Model[D]. Beijing:Tsinghua University, 2003. (in Chinese)
[12] 李春, 王作英. 基于语音学分类的三音子识别单元的研究[C]//全国人机语音通讯学术会议. 深圳, 2001:257-262.LI Chun, WANG Zuoying. Triphone recognition unit based on phonetics category[C]//The 6th National Coference of Human-Computer Speech Communication. Shenzhen, 2001:257-262. (in Chinese)
[1] 焦智灏, 杨健, 叶春茂, 宋建社. 基于散射成分一致性参数的极化SAR图像分类[J]. 清华大学学报(自然科学版), 2016, 56(8): 908-912.
[2] 韩赞东, 李永杰, 李晓阳. 残余奥氏体含量涡流检测仿真与特征提取[J]. 清华大学学报(自然科学版), 2016, 56(6): 617-621.
[3] 杨向东, 芮晓飞, 谢颖. 基于高效Hough变换的圆柱特征检测方法[J]. 清华大学学报(自然科学版), 2015, 55(8): 921-926.
[4] 卢兆麟, 李升波, 徐少兵, 成波. 基于眼动跟踪特征的汽车造型评价方法[J]. 清华大学学报(自然科学版), 2015, 55(7): 775-781.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 《清华大学学报(自然科学版)》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn