Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  百年期刊
Journal of Tsinghua University(Science and Technology)    2017, Vol. 57 Issue (8) : 857-861     DOI: 10.16511/j.cnki.qhdxxb.2017.22.050
ELECTRONIC ENGINEERING |
Speech recognition adaptive clustering feature extraction algorithms based on the k-means algorithm and the normalized intra-class variance
XIAO Xi, ZHOU Lu
Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
Download: PDF(1029 KB)  
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks    
Abstract  The inter-frame independence assumption for speech recognition simplifies the computations. However, it also reduces the model accuracy and can easily give rise to recognition errors. Therefore, the objective of this paper is to search for a feature which can weaken the inter-frame dependence of the speech features and keep as much information of the original speech as possible. Two speech recognition feature extraction algorithms are given based on the k-means algorithm and the normalized intra-class variance. These algorithms provide adaptive clustering feature extraction. Speech recognition tests with these algorithms on a Gaussian mixture model-hidden Markov model (GMM-HMM), a duration distribution based HMM (DDBHMM), and a context dependent deep neural network HMM (CD-DNN-HMM) show that the adaptive feature based on the normalized intra-class variance reduces the relative recognition error rates by 10.53%, 5.17%, and 2.65% relative to the original features. Thus, this adaptive clustering feature extraction algorithm provides improved speech recognition.
Keywords feature extraction      adaptive clustering feature      assumption of inter-frame independence      normalized intra-class variance     
ZTFLH:  TP391.4  
Issue Date: 15 August 2017
Service
E-mail this article
E-mail Alert
RSS
Articles by authors
XIAO Xi
ZHOU Lu
Cite this article:   
XIAO Xi,ZHOU Lu. Speech recognition adaptive clustering feature extraction algorithms based on the k-means algorithm and the normalized intra-class variance[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(8): 857-861.
URL:  
http://jst.tsinghuajournals.com/EN/10.16511/j.cnki.qhdxxb.2017.22.050     OR     http://jst.tsinghuajournals.com/EN/Y2017/V57/I8/857
  
  
  
  
  
  
[1] Rabiner L R. A tutorial on hidden Markov models and selected applications in speech recognition[J]. Readings in Speech Recognition, 1990, 77(2):267-296.
[2] Reynolds D A, Rose R C. Robust text-independent speaker identification using Gaussian mixture speaker models[J]. IEEE Transactions on Speech & Audio Processing, 1995, 3(1):72-83.
url: http://dx.doi.org/Transactions on Speech
[3] Sainath T N, Kingsbury B, Saon G, et al. Deep convolutional neural networks for large-scale speech tasks[J]. Neural Networks the Official Journal of the International Neural Network Society, 2015, 64:39-48.
url: http://dx.doi.org/10.1016/j.neunet.2014.08.005
[4] Ansari Z, Seyyedsalehi S A. Toward growing modular deep neural networks for continuous speech recognition[J/OL]. Neural Computing & Applications, 2016:1-20. DOI:10.1007/s00521-016-2438-x.
url: http://dx.doi.org/10.1007/s00521-016-2438-x.
[5] Hinton G, Deng L, Yu D, et al. Deep neural networks for acoustic modeling in speech recognition:The shared views of four research groups[J]. IEEE Signal Processing Magazine, 2012, 29(6):82-97.
[6] Yu D, Deng L, Seide F. The deep tensor neural network with applications to large vocabulary speech recognition[J]. IEEE Transactions on Audio Speech & Language Processing, 2013, 21(2):388-396.
url: http://dx.doi.org/Transactions on Audio Speech
[7] Hinton G, Osindero S, Teh Y. A fast learning algorithm for deep belief nets[J]. Neural Computation, 1989, 18(7):1527-1554.
[8] Yu D, Seltzer M L. Improved bottleneck features using pretrained deep neural networks[C]//INTERSPEECH 2011, Conference of the International Speech Communication Association. Florence, Italy, 2011:237-240.
[9] Dahl G E, Yu D, Deng L, et al. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition[J]. IEEE Transactions on Audio Speech & Language Processing, 2012, 20(1):30-42.
url: http://dx.doi.org/Transactions on Audio Speech
[10] Rabiner L R. A tutorial on hidden Markov models and selected applications in speech recognition[J]. Readings in Speech Recognition, 1990, 77(2):267-296.
[11] 肖熙. DDBHMM语音识别模型的训练和识别算法[D]. 北京:清华大学, 2003.XIAO Xi. The Training and Recognition Algorithm for DDBHMM Speech Recognition Model[D]. Beijing:Tsinghua University, 2003. (in Chinese)
[12] 李春, 王作英. 基于语音学分类的三音子识别单元的研究[C]//全国人机语音通讯学术会议. 深圳, 2001:257-262.LI Chun, WANG Zuoying. Triphone recognition unit based on phonetics category[C]//The 6th National Coference of Human-Computer Speech Communication. Shenzhen, 2001:257-262. (in Chinese)
[1] ZHANG Mingfang, LI Guilin, WU Chuna, WANG Li, TONG Lianghao. Estimation algorithm of driver's gaze zone based on lightweight spatial feature encoding network[J]. Journal of Tsinghua University(Science and Technology), 2024, 64(1): 44-54.
[2] YANG Hongyu, ZHANG Zixin, ZHANG Liang. Network security situation assessments with parallel feature extraction and an improved BiGRU[J]. Journal of Tsinghua University(Science and Technology), 2022, 62(5): 842-848.
[3] SUN Yue, HE Ke, ZHANG Zhinan. Multi-source information fitting regression integrated model of coefficient of friction[J]. Journal of Tsinghua University(Science and Technology), 2022, 62(12): 1980-1988.
[4] ZHANG Tianyi, ZHU Zhiming, ZHU Chuanhui, SUN Bowen. Visual sensing image processing and feature information extraction for arc welding[J]. Journal of Tsinghua University(Science and Technology), 2022, 62(1): 156-162.
[5] JIAO Zhihao, YANG Jian, YE Chunmao, SONG Jianshe. Scattering component consistency based parameter for polarimetric SAR image classification[J]. Journal of Tsinghua University(Science and Technology), 2016, 56(8): 908-912.
[6] HAN Zandong, LI Yongjie, LI Xiaoyang. Simulation and feature extraction of eddy current tests for residual austenite content[J]. Journal of Tsinghua University(Science and Technology), 2016, 56(6): 617-621.
[7] YANG Xiangdong, RUI Xiaofei, XIE Ying. Efficient Hough transform based cylinder feature detection algorithm[J]. Journal of Tsinghua University(Science and Technology), 2015, 55(8): 921-926.
[8] LU Zhaolin, LI Shengbo, XU Shaobing, CHENG Bo. Automobile style evaluation based on eye tracking[J]. Journal of Tsinghua University(Science and Technology), 2015, 55(7): 775-781.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
Copyright © Journal of Tsinghua University(Science and Technology), All Rights Reserved.
Powered by Beijing Magtech Co. Ltd