THUYG-20:免费的维吾尔语语音数据库

艾斯卡尔·肉孜, 殷实, 张之勇, 王东, 艾斯卡尔·艾木都拉, 郑方

清华大学学报(自然科学版) ›› 2017, Vol. 57 ›› Issue (2) : 182-187.

PDF(1153 KB)
PDF(1153 KB)
清华大学学报(自然科学版) ›› 2017, Vol. 57 ›› Issue (2) : 182-187. DOI: 10.16511/j.cnki.qhdxxb.2017.22.012
信息工程

THUYG-20:免费的维吾尔语语音数据库

  • 艾斯卡尔·肉孜1, 殷实1, 张之勇1, 王东1, 艾斯卡尔·艾木都拉2, 郑方1
作者信息 +

THUYG-20: A free Uyghur speech database

  • Aisikaer Rouzi1, YIN Shi1, ZHANG Zhiyong1, WANG Dong1, Askar Hamdulla2, ZHENG Fang1
Author information +
文章历史 +

摘要

语音数据资源是语音识别研究的基础。当前国内只有为数不多的开放的语音数据库供研究者免费使用,特别是在维吾尔语等少数民族语音识别方面,数据资源更为贫乏。该文发布一个完全免费的维吾尔语连续语音数据库,该数据库包括约20 h的训练数据和1 h的测试数据,同时介绍了构建维吾尔语语音识别系统所需要的音素集、词表、文本数据等相关资源,以及用于构建基线系统的脚本。给出了该基线系统在纯净测试数据和噪声测试数据上的识别性能。该数据库为维吾尔语语音识别研究提供了可以借鉴的标准数据库。

Abstract

Speech data plays a fundamental role in research on speech recognition. However, there are few open speech databases available for researchers in China, especially for minor languages such as Uyghur. This paper develops a Uyghur continuous speech database which is totally open and free. The database consists of 20 h of training speech and 1 h of test speech, as well as all the resources needed to construct a full Uyghur speech recognition system, including a phone set, lexicon, and text data. A recipe used to construct the baseline system is also described with results for two test sets involving clean speech and noisy speech. This paper provides a standard database for Uyghur speech recognition.

关键词

语音识别 / 维吾尔语 / 语料库 / 深度神经网络(DNN)

Key words

speech recognition / Uyghur language / corpus / deep neural network (DNN)

引用本文

导出引用
艾斯卡尔·肉孜, 殷实, 张之勇, 王东, 艾斯卡尔·艾木都拉, 郑方. THUYG-20:免费的维吾尔语语音数据库[J]. 清华大学学报(自然科学版). 2017, 57(2): 182-187 https://doi.org/10.16511/j.cnki.qhdxxb.2017.22.012
Aisikaer Rouzi, YIN Shi, ZHANG Zhiyong, WANG Dong, Askar Hamdulla, ZHENG Fang. THUYG-20: A free Uyghur speech database[J]. Journal of Tsinghua University(Science and Technology). 2017, 57(2): 182-187 https://doi.org/10.16511/j.cnki.qhdxxb.2017.22.012
中图分类号: TP391.4   

参考文献

[1] 王昆仑, 樊志锦, 吐尔洪江, 等. 维吾尔语综合语音数据库系统[C]//第五届全国人机语音通讯学术会议. 哈尔滨, 1998:366-368.WANG Kunlun, FAN Zhijin, Turhunjan, et al. Integrated speech corpus system of Uyghur language[C]//The 5th National Conference on Man-Machine Speech Communication. Harbin, China, 1998:366-368. (in Chinese) [2] 蔡琴, 吾守尔·斯拉木. 基于HTK的维吾尔语连续数字语音识别[J]. 现代计算机, 2007(4):14-16.CAI Qin, Wushour Silamu. Uighur continuous digital speech recognition based on HTK[J]. Modern Computer, 2007(4):14-16. (in Chinese) [3] 那斯尔江·吐尔逊, 吾守尔·斯拉木, 陶梅. 基于HTK的维吾尔语连续语音识别研究[C]//第7届中文信息处理国际会议. 武汉, 2007.Nasirjan Tursun, Wushour Silamu, TAO Mei. Research of Uyghur continuous speech recognition based on HTK[C]//The 7th Conference on Chinese Information Processing. Wuhan, China, 2007. (in Chinese) [4] 努尔麦麦提·尤鲁瓦斯, 吾守尔·斯拉木, 热依曼·吐尔逊. 基于音节的维吾尔语大词汇连续语音识别系统[J]. 清华大学学报:自然科学版, 2013, 53(6):741-744.Nurmemet Yolwas, Wushor Silamu, Reyiman Tursun. Syllable based language model for large vocabulary continuous speech recognition of Uyghur[J]. Journal of Tsinghua University:Science and Technology, 2013, 53(6):741-744. (in Chinese) [5] Nasirjan Tursun, Wushour Silamu. Large vocabulary continuous speech recognition in Uyghur:Data preparation and experimental results[C]//Chinese Spoken Language Processing. Kunming, China, 2008:1-4. [6] 张小燕, 宿建军, 薛化建, 等. 维吾尔语语音识别语料库中的OOV研究[J]. 计算机工程与设计, 2012, 33(2):772-776.ZHANG Xiaoyan, SU Jianjun, XUE Huajian, et al. Research on OOV problem in constructing Uyghur speech corpus[J]. Computer Engineering and Design, 2012, 33(2):772-776. (in Chinese) [7] 王昆仑. 维吾尔语音节语音识别与识别基元的研究[J]. 计算机科学, 2003, 30(7):182-184.WANG Kunlun. A study of Uighur syllable speech recognition and the base element of the recognition[J]. Computer Science, 2003, 30(7):182-184. (in Chinese) [8] 王昆仑. 基于CDCPM的维吾尔语非特定人语音识别[J]. 计算机研究与发展, 2001, 38(10):1242-1246.WANG Kunlun. Uighur speaker independent speech recognition based on CDCPM[J]. Journal of Computer Research & Development, 2001, 38(10):1242-1246. (in Chinese) [9] 努尔麦麦提·尤鲁瓦斯, 吾守尔·斯拉木, 热依曼·吐尔逊. 维吾尔语大词汇语音识别系统识别单元研究[J]. 北京大学学报:自然科学版, 2014, 50(1):149-152.Nurmemet Yolwas, Wushour Silamu, Reyiman Tursun. Research on recognition units of large vocabulary speech recognition system of Uyghur[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2014, 50(1):149-152. (in Chinese) [10] 努尔麦麦提·尤鲁瓦斯, 吾守尔·斯拉木. 维吾尔语连续语音识别声学模型优化研究[J]. 计算机工程与应用, 2013, 49(2):145-147.Nurmemet Yolwas, Wushour Silamu. Optimization of acoustic model for Uyghur continuous speech recognition[J]. Computer Engineering and Applications, 2013, 49(2):145-147. (in Chinese) [11] Wushour Silamu, Nasirjan Tursun. HMM-based Uyghur continuous speech recognition system[C]//World Congress on Computer Science and Information Engineering. Los Angeles, CA, USA, 2009:243-247. [12] 那斯尔江·吐尔逊, 吾守尔·斯拉木. 基于隐马尔可夫模型的维吾尔语连续语音识别系统[J]. 计算机应用, 2009, 29(2):2009-2011, 2025.Nasirjan Tursun, Wushour Silamu. Uyghur continuous speech recognition system based on HMM[J]. Computer Application, 2009, 29(2):2009-2011, 2025. (in Chinese) [13] 陶梅, 吾守尔·斯拉木, 那斯尔江·吐尔逊. 基于HTK的维吾尔语连续语音声学建模[J]. 中文信息学报, 2008, 22(5):56-59.TAO Mei, Wushour Silamu, Nasirjan Tursun. The Uyghur acoustic model based on HTK[J]. Journal of Chinese Information Processing, 2008, 22(5):56-59. (in Chinese) [14] 杨雅婷, 马博, 王磊, 等. 多发音字典在维吾尔语方言语音识别中的应用[J].清华大学学报:自然科学版, 2011, 51(9):1303-1306.YANG Yating, MA Bo, WANG Lei, et al. Multi-pronunciation dictionary based on Uyghur accent modeling for speech recognition[J]. Journal of Tsinghua University:Science and Technology, 2011, 51(9):1303-1306. (in Chinese) [15] 杨雅婷, 马博, 王磊, 等. 维吾尔语语音识别中发音变异现象[J].清华大学学报:自然科学版, 2011, 51(9):1230-1233, 1238.YANG Yating, MA Bo, WANG Lei, et al. Uyghur pronunciation variations in automatic speech recognition systems[J]. Journal of Tsinghua University:Science and Technology, 2011, 51(9):1230-1233, 1238. (in Chinese) [16] Mijit Ablimit, Neubig G, Mimura M. Uyghur morpheme-based language models and ASR[C]//Proceeding of ICSP. Beijing, China, 2010:581-584. [17] Mijit Ablimit, Askar Hamdulla, Kawahara T. Morpheme concatenation approach in language modeling for large-vocabulary Uyghur speech recognition[C]//Oriental COCOSDA. Hsinchu, China, 2011:112-115. [18] Mijit Ablimit, Kawahara T, Askar Hamdulla. Lexicon optimization for automatic speech recognition based on discriminative learning[C]//APSIPA SC. Xi'an, China, 2011:935-938. [19] Mijit Ablimit, Kawahara T, Askar Hamdulla. Discriminative approach to lexical entry selection for automatic speech recognition of agglutinative language[C]//ICASSP. Kyoto, Japan, 2012:5009-5012. [20] Mijit Ablimit, Kawahara T, Askar Hamdulla. Lexicon optimization based on discriminative learning for automatic speech recognition of agglutinative language[J]. Speech Communication, 2014, 60:78-87. [21] 薛化建, 董兴华, 周喜, 等. 基于子字单元的维吾尔语语音识别研究[J]. 计算机工程, 2011, 37(20):208-210.XUE Huajian, DONG Xinghua, ZHOU Xi, et al. Research on Uyghur speech recognition based on subword unit[J]. Computer Engineering, 2011, 37(20):208-210. (in Chinese) [22] LI Xin, CAI Shang, PAN Jielin. Large vocabulary Uyghur continuous speech recognition based on stems and suffixes[C]//Chinese Spoken Language Processing (ISCSLP). Tainan, China, 2010:220-223. [23] 米日古力·阿布都热素, 艾克白尔·帕塔尔, 艾斯卡尔·艾木都拉. 基于电话语料的维吾尔连续音素识别[J]. 通信技术, 2012, 45(7):54-56.Mirigul Abdursul, Akbar Pattar, Askar Hamdulla. Telephone speech corpus-based Uyghur continuous phoneme recognition[J]. Communication Technology, 2012, 45(7):54-56. (in Chinese) [24] Povey D, Ghoshal A, Boulianne G, et al. The Kaldi speech recognition toolkit[C]//Proc of ASRU. Waikoloa, HI, USA, 2011. [25] YIN Shi, LIU Chao, ZHANG Zhiyong, et al. Noisy training for deep neural networks in speech recognition[J]. EURASIP Journal on Audio, Speech, and Music Processing, 2015, 2015(1):1-14.

PDF(1153 KB)

Accesses

Citation

Detail

段落导航
相关文章

/