Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  百年期刊
Journal of Tsinghua University(Science and Technology)    2017, Vol. 57 Issue (2) : 182-187     DOI: 10.16511/j.cnki.qhdxxb.2017.22.012
INFORMATION ENGINEERING |
THUYG-20: A free Uyghur speech database
Aisikaer Rouzi1, YIN Shi1, ZHANG Zhiyong1, WANG Dong1, Askar Hamdulla2, ZHENG Fang1
1. Research Institute of Information Technology, Tsinghua National Laboratory for Information Science and Technology, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China;
2. School of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
Download: PDF(1153 KB)  
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks    
Abstract  Speech data plays a fundamental role in research on speech recognition. However, there are few open speech databases available for researchers in China, especially for minor languages such as Uyghur. This paper develops a Uyghur continuous speech database which is totally open and free. The database consists of 20 h of training speech and 1 h of test speech, as well as all the resources needed to construct a full Uyghur speech recognition system, including a phone set, lexicon, and text data. A recipe used to construct the baseline system is also described with results for two test sets involving clean speech and noisy speech. This paper provides a standard database for Uyghur speech recognition.
Keywords speech recognition      Uyghur language      corpus      deep neural network (DNN)     
ZTFLH:  TP391.4  
Issue Date: 15 February 2017
Service
E-mail this article
E-mail Alert
RSS
Articles by authors
Cite this article:   
Aisikaer Rouzi, YIN Shi, ZHANG Zhiyong, WANG Dong, Askar Hamdulla, ZHENG Fang. THUYG-20: A free Uyghur speech database[J]. Journal of Tsinghua University(Science and Technology),2017, 57(2): 182-187.
URL:  
http://jst.tsinghuajournals.com/EN/10.16511/j.cnki.qhdxxb.2017.22.012     OR     http://jst.tsinghuajournals.com/EN/Y2017/V57/I2/182
  
  
  
  
  
  
[1] 王昆仑, 樊志锦, 吐尔洪江, 等. 维吾尔语综合语音数据库系统[C]//第五届全国人机语音通讯学术会议. 哈尔滨, 1998:366-368.WANG Kunlun, FAN Zhijin, Turhunjan, et al. Integrated speech corpus system of Uyghur language[C]//The 5th National Conference on Man-Machine Speech Communication. Harbin, China, 1998:366-368. (in Chinese)
[2] 蔡琴, 吾守尔·斯拉木. 基于HTK的维吾尔语连续数字语音识别[J]. 现代计算机, 2007(4):14-16.CAI Qin, Wushour Silamu. Uighur continuous digital speech recognition based on HTK[J]. Modern Computer, 2007(4):14-16. (in Chinese)
[3] 那斯尔江·吐尔逊, 吾守尔·斯拉木, 陶梅. 基于HTK的维吾尔语连续语音识别研究[C]//第7届中文信息处理国际会议. 武汉, 2007.Nasirjan Tursun, Wushour Silamu, TAO Mei. Research of Uyghur continuous speech recognition based on HTK[C]//The 7th Conference on Chinese Information Processing. Wuhan, China, 2007. (in Chinese)
[4] 努尔麦麦提·尤鲁瓦斯, 吾守尔·斯拉木, 热依曼·吐尔逊. 基于音节的维吾尔语大词汇连续语音识别系统[J]. 清华大学学报:自然科学版, 2013, 53(6):741-744.Nurmemet Yolwas, Wushor Silamu, Reyiman Tursun. Syllable based language model for large vocabulary continuous speech recognition of Uyghur[J]. Journal of Tsinghua University:Science and Technology, 2013, 53(6):741-744. (in Chinese)
[5] Nasirjan Tursun, Wushour Silamu. Large vocabulary continuous speech recognition in Uyghur:Data preparation and experimental results[C]//Chinese Spoken Language Processing. Kunming, China, 2008:1-4.
[6] 张小燕, 宿建军, 薛化建, 等. 维吾尔语语音识别语料库中的OOV研究[J]. 计算机工程与设计, 2012, 33(2):772-776.ZHANG Xiaoyan, SU Jianjun, XUE Huajian, et al. Research on OOV problem in constructing Uyghur speech corpus[J]. Computer Engineering and Design, 2012, 33(2):772-776. (in Chinese)
[7] 王昆仑. 维吾尔语音节语音识别与识别基元的研究[J]. 计算机科学, 2003, 30(7):182-184.WANG Kunlun. A study of Uighur syllable speech recognition and the base element of the recognition[J]. Computer Science, 2003, 30(7):182-184. (in Chinese)
[8] 王昆仑. 基于CDCPM的维吾尔语非特定人语音识别[J]. 计算机研究与发展, 2001, 38(10):1242-1246.WANG Kunlun. Uighur speaker independent speech recognition based on CDCPM[J]. Journal of Computer Research & Development, 2001, 38(10):1242-1246. (in Chinese)
url: http://dx.doi.org/al of Computer Research
[9] 努尔麦麦提·尤鲁瓦斯, 吾守尔·斯拉木, 热依曼·吐尔逊. 维吾尔语大词汇语音识别系统识别单元研究[J]. 北京大学学报:自然科学版, 2014, 50(1):149-152.Nurmemet Yolwas, Wushour Silamu, Reyiman Tursun. Research on recognition units of large vocabulary speech recognition system of Uyghur[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2014, 50(1):149-152. (in Chinese)
[10] 努尔麦麦提·尤鲁瓦斯, 吾守尔·斯拉木. 维吾尔语连续语音识别声学模型优化研究[J]. 计算机工程与应用, 2013, 49(2):145-147.Nurmemet Yolwas, Wushour Silamu. Optimization of acoustic model for Uyghur continuous speech recognition[J]. Computer Engineering and Applications, 2013, 49(2):145-147. (in Chinese)
[11] Wushour Silamu, Nasirjan Tursun. HMM-based Uyghur continuous speech recognition system[C]//World Congress on Computer Science and Information Engineering. Los Angeles, CA, USA, 2009:243-247.
[12] 那斯尔江·吐尔逊, 吾守尔·斯拉木. 基于隐马尔可夫模型的维吾尔语连续语音识别系统[J]. 计算机应用, 2009, 29(2):2009-2011, 2025.Nasirjan Tursun, Wushour Silamu. Uyghur continuous speech recognition system based on HMM[J]. Computer Application, 2009, 29(2):2009-2011, 2025. (in Chinese)
[13] 陶梅, 吾守尔·斯拉木, 那斯尔江·吐尔逊. 基于HTK的维吾尔语连续语音声学建模[J]. 中文信息学报, 2008, 22(5):56-59.TAO Mei, Wushour Silamu, Nasirjan Tursun. The Uyghur acoustic model based on HTK[J]. Journal of Chinese Information Processing, 2008, 22(5):56-59. (in Chinese)
[14] 杨雅婷, 马博, 王磊, 等. 多发音字典在维吾尔语方言语音识别中的应用[J].清华大学学报:自然科学版, 2011, 51(9):1303-1306.YANG Yating, MA Bo, WANG Lei, et al. Multi-pronunciation dictionary based on Uyghur accent modeling for speech recognition[J]. Journal of Tsinghua University:Science and Technology, 2011, 51(9):1303-1306. (in Chinese)
[15] 杨雅婷, 马博, 王磊, 等. 维吾尔语语音识别中发音变异现象[J].清华大学学报:自然科学版, 2011, 51(9):1230-1233, 1238.YANG Yating, MA Bo, WANG Lei, et al. Uyghur pronunciation variations in automatic speech recognition systems[J]. Journal of Tsinghua University:Science and Technology, 2011, 51(9):1230-1233, 1238. (in Chinese)
[16] Mijit Ablimit, Neubig G, Mimura M. Uyghur morpheme-based language models and ASR[C]//Proceeding of ICSP. Beijing, China, 2010:581-584.
[17] Mijit Ablimit, Askar Hamdulla, Kawahara T. Morpheme concatenation approach in language modeling for large-vocabulary Uyghur speech recognition[C]//Oriental COCOSDA. Hsinchu, China, 2011:112-115.
[18] Mijit Ablimit, Kawahara T, Askar Hamdulla. Lexicon optimization for automatic speech recognition based on discriminative learning[C]//APSIPA SC. Xi'an, China, 2011:935-938.
[19] Mijit Ablimit, Kawahara T, Askar Hamdulla. Discriminative approach to lexical entry selection for automatic speech recognition of agglutinative language[C]//ICASSP. Kyoto, Japan, 2012:5009-5012.
[20] Mijit Ablimit, Kawahara T, Askar Hamdulla. Lexicon optimization based on discriminative learning for automatic speech recognition of agglutinative language[J]. Speech Communication, 2014, 60:78-87.
url: http://dx.doi.org/10.1016/j.specom.2013.09.011
[21] 薛化建, 董兴华, 周喜, 等. 基于子字单元的维吾尔语语音识别研究[J]. 计算机工程, 2011, 37(20):208-210.XUE Huajian, DONG Xinghua, ZHOU Xi, et al. Research on Uyghur speech recognition based on subword unit[J]. Computer Engineering, 2011, 37(20):208-210. (in Chinese)
[22] LI Xin, CAI Shang, PAN Jielin. Large vocabulary Uyghur continuous speech recognition based on stems and suffixes[C]//Chinese Spoken Language Processing (ISCSLP). Tainan, China, 2010:220-223.
[23] 米日古力·阿布都热素, 艾克白尔·帕塔尔, 艾斯卡尔·艾木都拉. 基于电话语料的维吾尔连续音素识别[J]. 通信技术, 2012, 45(7):54-56.Mirigul Abdursul, Akbar Pattar, Askar Hamdulla. Telephone speech corpus-based Uyghur continuous phoneme recognition[J]. Communication Technology, 2012, 45(7):54-56. (in Chinese)
[24] Povey D, Ghoshal A, Boulianne G, et al. The Kaldi speech recognition toolkit[C]//Proc of ASRU. Waikoloa, HI, USA, 2011.
[25] YIN Shi, LIU Chao, ZHANG Zhiyong, et al. Noisy training for deep neural networks in speech recognition[J]. EURASIP Journal on Audio, Speech, and Music Processing, 2015, 2015(1):1-14.
[1] ZHANG Yu, ZHANG Pengyuan, YAN Yonghong. Long short-term memory with attention and multitask learning for distant speech recognition[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(3): 249-253.
[2] YI Jiangyan, TAO Jianhua, LIU Bin, WEN Zhengqi. Transfer learning for acoustic modeling of noise robust speech recognition[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(1): 55-60.
[3] FU Ruibo, TAO Jianhua, LI Ya, WEN Zhengqi. Automatic prosodic boundary labeling based on fusing the silence duration with the lexical features[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(1): 61-66,74.
[4] WANG Jianrong, GAO Yongchun, ZHANG Ju, WEI Jianguo, DANG Jianwu. Automatic speech recognition by a Kinect sensor for a robot under ego noises[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(9): 921-925.
[5] Mijit Ablimit, Akbar Pattar, Askar Hamdulla. Multilayer structure based lexicon optimization for language modeling[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(3): 257-263.
[6] ZHANG Pengyuan, JI Zhe, HOU Wei, JIN Xin, HAN Weisheng. Design and optimization of a low resource speech recognition system[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(2): 147-152.
[7] WANG Jianrong, ZHANG Ju, LU Wenhuan, WEI Jianguo, DANG Jianwu. Automatic speech recognition with robot noise[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(2): 153-157.
[8] Abdurahim Mahmoud, Hussein Yusuf, ZHANG Jiajun, ZONG Chengqing, Askar Hamdulla. Name recognition in the Uyghur language based on fuzzy matching and syllable-character conversion[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(2): 188-196.
[9] XING Anhao, ZHANG Pengyuan, PAN Jielin, YAN Yonghong. SVD-based DNN pruning and retraining[J]. Journal of Tsinghua University(Science and Technology), 2016, 56(7): 772-776.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
Copyright © Journal of Tsinghua University(Science and Technology), All Rights Reserved.
Powered by Beijing Magtech Co. Ltd