Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  百年期刊
Journal of Tsinghua University(Science and Technology)    2018, Vol. 58 Issue (4) : 342-346     DOI: 10.16511/j.cnki.qhdxxb.2018.22.020
COMPUTER SCIENCE AND TECHNOLOGY |
Crosslingual acoustic modeling in Uyghur speech recognition
NURMEMET Yolwas1, LIU Junhua2, WUSHOUR Silamu1, REYIMAN Tursun1, DAWEL Abilhayer1
1. College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China;
2. iFLYTEK Co., Ltd., Hefei 230088, China
Download: PDF(998 KB)  
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks    
Abstract  The Uyghur language has a little speech data for training acoustic models due to various data acquisition and annotation difficulties. This paper describes a modeling method for crosslingual acoustic models based on long short-term memory models. Mass Chinese language training data is used to train a deep neural network acoustic model. The network output layer weights are then randomly modified to create the output layer for the Uyghur language. A Uyghur language acoustic model is then trained using Uyghur language speech data to update all the weights. Tests show that this method reduces the word error rates of the Uyghur language transcription and dictation recognition by 20% and 30% than the baseline system. Thus, this method improves the Uyghur language acoustic model with better initial weights from the Chinese language data to train hidden layers in the neural network, and enhances the network robustness.
Keywords acoustic model      Uyghur      crosslingual      long short-term memory     
ZTFLH:  TP391.4  
Issue Date: 15 April 2018
Service
E-mail this article
E-mail Alert
RSS
Articles by authors
Cite this article:   
NURMEMET Yolwas, LIU Junhua, WUSHOUR Silamu, REYIMAN Tursun, DAWEL Abilhayer. Crosslingual acoustic modeling in Uyghur speech recognition[J]. Journal of Tsinghua University(Science and Technology),2018, 58(4): 342-346.
URL:  
http://jst.tsinghuajournals.com/EN/10.16511/j.cnki.qhdxxb.2018.22.020     OR     http://jst.tsinghuajournals.com/EN/Y2018/V58/I4/342
  
  
  
  
[1] 麦麦提艾力·吐尔逊, 戴礼荣. 深度神经网络在维吾尔语大词汇量连续语音识别中的应用[J]. 数据采集与处理, 2015, 30(2):365-371. MAIMAITIAILI T, DAI L R. Deep neural network based Uyghur large vocabulary continuous speech recognition[J]. Journal of Data Acquisition and Processing, 2015, 30(2):365-371. (in Chinese)
[2] 其米克·巴特西, 黄浩, 王羡慧. 基于深度神经网络的维吾尔语语音识别[J]. 计算机工程与设计, 2015, 36(8):2239-2244. QIMIKE B, HUANG H, WANG X H. Uyghur speech recognition based on deep neural network[J]. Computer Engineering and Design, 2015, 36(8):2239-2244. (in Chinese)
[3] 刘林泉, 郑方, 吴文虎. 基于小数据量的方言普通话语音识别声学建模[J]. 清华大学学报(自然科学版), 2008, 48(4):604-607. LIU L Q, ZHENG F, WU W H. Small dataset-based acoustic modeling for dialectal Chinese speech recognition[J]. Journal of Tsinghua University (Science and Technology), 2008, 48(4):604-607. (in Chinese)
[4] SCHULTZ T, WAIBEL A. Experiments on cross-language acoustic modeling[C]//The 7th European Conference on Speech Communication and Technology. Aalborg, Denmark, 2001:2721-2724.
[5] POVEY D, BURGET L, AGARWAL M, et al. The subspace Gaussian mixture model:A structured model for speech recognition[J]. Computer Speech & Language, 2011, 25(2):404-439.
[6] BURGET L, SCHWARZ P, AGARWAL M, et al. Multilingual acoustic modeling for speech recognition based on subspace Gaussian mixture models[C]//IEEE International Conference on Acoustics Speech and Signal Processing. Dallas, USA, 2010:4334-4337.
[7] STOLCKE A, GREZL F, HWANG M Y, et al. Cross-domain and cross-language portability of acoustic features estimated by multilayer perceptron[C]//IEEE International Conference on Acoustics, Speech and Signal Processing. Toulouse, France, 2006:321-324.
[8] VESELý K, KARAFIÁT M, GRÉZL F, et al. The language-independent bottleneck features[C]//2012 Workshop on Spoken Language Technology. Miami, USA, 2012:336-341.
[9] SWIETOJANSKI P, GHOSHAL A, RENALS S. Unsupervised cross-lingual knowledge transfer in DNN-based LVCSR[C]//2012 Workshop on Spoken Language Technology. Miami, USA, 2012:246-251.
[10] SIM K C, LI H. Context-sensitive probabilistic phone mapping model for cross-lingual speech recognition[C]//9th Annual Conference of the International Speech Communication Association. Brisbane, Australia, 2008:2715-2718.
[11] DO V H, XIAO X, CHNG E S, et al. Context dependant phone mapping for cross-lingual acoustic modeling[C]//20128th International Symposium on Chinese Spoken Language Processing. Hong Kong, China, 2012:16-20.
[12] HUANG J T, LI J, YU D, et al. Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers[C]//IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada, 2013:7304-7308.
[13] ROBINSON A J. An application of recurrent nets to phone probability estimation[J]. IEEE Transactions on Neural Networks, 1994, 5(2):298-305.
[1] DAI Xin, HUANG Hong, JI Xinyu, WANG Wei. Spatiotemporal rapid prediction model of urban rainstorm waterlogging based on machine learning[J]. Journal of Tsinghua University(Science and Technology), 2023, 63(6): 865-873.
[2] GUO Shiyuan, MA Weizhi, LU Ruilin, LIU Jinlong, YANG Zhigang, WANG Zhongjing, ZHANG Min. Prediction of canal discharge under complex conditions based on a long short-term memory neural network[J]. Journal of Tsinghua University(Science and Technology), 2023, 63(12): 1924-1934.
[3] CHEN Chuangang, HU Jinqiu, HAN Zicong, CHEN Yiyue, XIAO Shangrui. Knowledge graph based early warning method for accident evolution in natural gas pipeline station abroad for harsh environmental conditions[J]. Journal of Tsinghua University(Science and Technology), 2022, 62(6): 1081-1087.
[4] ZHANG Yu, ZHANG Pengyuan, YAN Yonghong. Long short-term memory with attention and multitask learning for distant speech recognition[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(3): 249-253.
[5] YI Jiangyan, TAO Jianhua, LIU Bin, WEN Zhengqi. Transfer learning for acoustic modeling of noise robust speech recognition[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(1): 55-60.
[6] Halidanmu Abudukelimu, LIU Yang, SUN Maosong. Performance comparison of neural machinetranslation systems in Uyghur-Chinese translation[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(8): 878-883.
[7] ABULIZI Abudukelimu, JIANG Minghu, YAO Dengfeng, ABUDUKELIMU Halidanmu. Neurocognitive mechanism for morphological complex word processing[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(4): 393-398.
[8] Mijit Ablimit, Akbar Pattar, Askar Hamdulla. Multilayer structure based lexicon optimization for language modeling[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(3): 257-263.
[9] IMAM Seyyare, PARHAT Rayilam, HAMDULLA Askar, LI Zhijun. Keyword extraction algorithms for emotion recognition from Uyghur text[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(3): 270-273.
[10] Aisikaer Rouzi, YIN Shi, ZHANG Zhiyong, WANG Dong, Askar Hamdulla, ZHENG Fang. THUYG-20: A free Uyghur speech database[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(2): 182-187.
[11] Abdurahim Mahmoud, Hussein Yusuf, ZHANG Jiajun, ZONG Chengqing, Askar Hamdulla. Name recognition in the Uyghur language based on fuzzy matching and syllable-character conversion[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(2): 188-196.
[12] Abdusalam Dawut, Hussein Yusuf, Askar Hamdulla. Emotion recognition from Uyghur sentences based on combinations of class discrimination words and a sentiment dictionary[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(2): 197-201.
[13] Hankiz Yilahun, Gulmire Imam, Maynur Ablitip, Guljamal Mamateli, Askar Hamdulla. Undulating scale of intonations of exclamatory Uyghur sentences[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(12): 1254-1258.
[14] Gulmire Imam, Guljamal Mamateli, Maynur Ablitip, Askar Hamdulla. Prosody modeling for Uyghur TTS[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(12): 1259-1264.
[15] ABUDUKELIMU Halidanmu, CHENG Yong, LIU Yang, SUN Maosong. Uyghur morphological segmentation with bidirectional GRU neural networks[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(1): 1-6.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
Copyright © Journal of Tsinghua University(Science and Technology), All Rights Reserved.
Powered by Beijing Magtech Co. Ltd