THUYG-20:免费的维吾尔语语音数据库

doi:10.16511/j.cnki.qhdxxb.2017.22.012

摘要
图/表
参考文献
相关文章
Metrics

全文: PDF(1153 KB)
输出: BibTeX | EndNote (RIS)

摘要语音数据资源是语音识别研究的基础。当前国内只有为数不多的开放的语音数据库供研究者免费使用，特别是在维吾尔语等少数民族语音识别方面，数据资源更为贫乏。该文发布一个完全免费的维吾尔语连续语音数据库，该数据库包括约20 h的训练数据和1 h的测试数据，同时介绍了构建维吾尔语语音识别系统所需要的音素集、词表、文本数据等相关资源，以及用于构建基线系统的脚本。给出了该基线系统在纯净测试数据和噪声测试数据上的识别性能。该数据库为维吾尔语语音识别研究提供了可以借鉴的标准数据库。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS

	作者相关文章

关键词 ：语音识别, 维吾尔语, 语料库, 深度神经网络(DNN)

Abstract：Speech data plays a fundamental role in research on speech recognition. However, there are few open speech databases available for researchers in China, especially for minor languages such as Uyghur. This paper develops a Uyghur continuous speech database which is totally open and free. The database consists of 20 h of training speech and 1 h of test speech, as well as all the resources needed to construct a full Uyghur speech recognition system, including a phone set, lexicon, and text data. A recipe used to construct the baseline system is also described with results for two test sets involving clean speech and noisy speech. This paper provides a standard database for Uyghur speech recognition.

Key words： speech recognition Uyghur language corpus deep neural network (DNN)

收稿日期: 2016-06-24 出版日期: 2017-02-15

ZTFLH:

TP391.4

通讯作者: 郑方,教授,E-mail:fzheng@tsinghua.edu.cn E-mail: fzheng@tsinghua.edu.cn

引用本文:

艾斯卡尔·肉孜, 殷实, 张之勇, 王东, 艾斯卡尔·艾木都拉, 郑方. THUYG-20:免费的维吾尔语语音数据库[J]. 清华大学学报（自然科学版）, 2017, 57(2): 182-187.
Aisikaer Rouzi, YIN Shi, ZHANG Zhiyong, WANG Dong, Askar Hamdulla, ZHENG Fang. THUYG-20: A free Uyghur speech database. Journal of Tsinghua University(Science and Technology), 2017, 57(2): 182-187.

链接本文:

http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2017.22.012 或 http://jst.tsinghuajournals.com/CN/Y2017/V57/I2/182

表1 THUYGG20语音语料库参数

表2 THUYGG20文本语料库参数

图1 DNNGHMM 模型框架图

表3 两种语言模型在TESTGA 上的识别结果

表4 基线系统在TESTGN 上的识别结果

表5 基线系统加噪训练后在TESTGN 上的识别结果

[1]	王昆仑, 樊志锦, 吐尔洪江, 等. 维吾尔语综合语音数据库系统[C]//第五届全国人机语音通讯学术会议. 哈尔滨, 1998:366-368.WANG Kunlun, FAN Zhijin, Turhunjan, et al. Integrated speech corpus system of Uyghur language[C]//The 5th National Conference on Man-Machine Speech Communication. Harbin, China, 1998:366-368. (in Chinese)
[2]	蔡琴, 吾守尔·斯拉木. 基于HTK的维吾尔语连续数字语音识别[J]. 现代计算机, 2007(4):14-16.CAI Qin, Wushour Silamu. Uighur continuous digital speech recognition based on HTK[J]. Modern Computer, 2007(4):14-16. (in Chinese)
[3]	那斯尔江·吐尔逊, 吾守尔·斯拉木, 陶梅. 基于HTK的维吾尔语连续语音识别研究[C]//第7届中文信息处理国际会议. 武汉, 2007.Nasirjan Tursun, Wushour Silamu, TAO Mei. Research of Uyghur continuous speech recognition based on HTK[C]//The 7th Conference on Chinese Information Processing. Wuhan, China, 2007. (in Chinese)
[4]	努尔麦麦提·尤鲁瓦斯, 吾守尔·斯拉木, 热依曼·吐尔逊. 基于音节的维吾尔语大词汇连续语音识别系统[J]. 清华大学学报:自然科学版, 2013, 53(6):741-744.Nurmemet Yolwas, Wushor Silamu, Reyiman Tursun. Syllable based language model for large vocabulary continuous speech recognition of Uyghur[J]. Journal of Tsinghua University:Science and Technology, 2013, 53(6):741-744. (in Chinese)
[5]	Nasirjan Tursun, Wushour Silamu. Large vocabulary continuous speech recognition in Uyghur:Data preparation and experimental results[C]//Chinese Spoken Language Processing. Kunming, China, 2008:1-4.
[6]	张小燕, 宿建军, 薛化建, 等. 维吾尔语语音识别语料库中的OOV研究[J]. 计算机工程与设计, 2012, 33(2):772-776.ZHANG Xiaoyan, SU Jianjun, XUE Huajian, et al. Research on OOV problem in constructing Uyghur speech corpus[J]. Computer Engineering and Design, 2012, 33(2):772-776. (in Chinese)
[7]	王昆仑. 维吾尔语音节语音识别与识别基元的研究[J]. 计算机科学, 2003, 30(7):182-184.WANG Kunlun. A study of Uighur syllable speech recognition and the base element of the recognition[J]. Computer Science, 2003, 30(7):182-184. (in Chinese)
[8]	王昆仑. 基于CDCPM的维吾尔语非特定人语音识别[J]. 计算机研究与发展, 2001, 38(10):1242-1246.WANG Kunlun. Uighur speaker independent speech recognition based on CDCPM[J]. Journal of Computer Research & Development, 2001, 38(10):1242-1246. (in Chinese)
[9]	努尔麦麦提·尤鲁瓦斯, 吾守尔·斯拉木, 热依曼·吐尔逊. 维吾尔语大词汇语音识别系统识别单元研究[J]. 北京大学学报:自然科学版, 2014, 50(1):149-152.Nurmemet Yolwas, Wushour Silamu, Reyiman Tursun. Research on recognition units of large vocabulary speech recognition system of Uyghur[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2014, 50(1):149-152. (in Chinese)
[10]	努尔麦麦提·尤鲁瓦斯, 吾守尔·斯拉木. 维吾尔语连续语音识别声学模型优化研究[J]. 计算机工程与应用, 2013, 49(2):145-147.Nurmemet Yolwas, Wushour Silamu. Optimization of acoustic model for Uyghur continuous speech recognition[J]. Computer Engineering and Applications, 2013, 49(2):145-147. (in Chinese)
[11]	Wushour Silamu, Nasirjan Tursun. HMM-based Uyghur continuous speech recognition system[C]//World Congress on Computer Science and Information Engineering. Los Angeles, CA, USA, 2009:243-247.
[12]	那斯尔江·吐尔逊, 吾守尔·斯拉木. 基于隐马尔可夫模型的维吾尔语连续语音识别系统[J]. 计算机应用, 2009, 29(2):2009-2011, 2025.Nasirjan Tursun, Wushour Silamu. Uyghur continuous speech recognition system based on HMM[J]. Computer Application, 2009, 29(2):2009-2011, 2025. (in Chinese)
[13]	陶梅, 吾守尔·斯拉木, 那斯尔江·吐尔逊. 基于HTK的维吾尔语连续语音声学建模[J]. 中文信息学报, 2008, 22(5):56-59.TAO Mei, Wushour Silamu, Nasirjan Tursun. The Uyghur acoustic model based on HTK[J]. Journal of Chinese Information Processing, 2008, 22(5):56-59. (in Chinese)
[14]	杨雅婷, 马博, 王磊, 等. 多发音字典在维吾尔语方言语音识别中的应用[J].清华大学学报:自然科学版, 2011, 51(9):1303-1306.YANG Yating, MA Bo, WANG Lei, et al. Multi-pronunciation dictionary based on Uyghur accent modeling for speech recognition[J]. Journal of Tsinghua University:Science and Technology, 2011, 51(9):1303-1306. (in Chinese)
[15]	杨雅婷, 马博, 王磊, 等. 维吾尔语语音识别中发音变异现象[J].清华大学学报:自然科学版, 2011, 51(9):1230-1233, 1238.YANG Yating, MA Bo, WANG Lei, et al. Uyghur pronunciation variations in automatic speech recognition systems[J]. Journal of Tsinghua University:Science and Technology, 2011, 51(9):1230-1233, 1238. (in Chinese)
[16]	Mijit Ablimit, Neubig G, Mimura M. Uyghur morpheme-based language models and ASR[C]//Proceeding of ICSP. Beijing, China, 2010:581-584.
[17]	Mijit Ablimit, Askar Hamdulla, Kawahara T. Morpheme concatenation approach in language modeling for large-vocabulary Uyghur speech recognition[C]//Oriental COCOSDA. Hsinchu, China, 2011:112-115.
[18]	Mijit Ablimit, Kawahara T, Askar Hamdulla. Lexicon optimization for automatic speech recognition based on discriminative learning[C]//APSIPA SC. Xi'an, China, 2011:935-938.
[19]	Mijit Ablimit, Kawahara T, Askar Hamdulla. Discriminative approach to lexical entry selection for automatic speech recognition of agglutinative language[C]//ICASSP. Kyoto, Japan, 2012:5009-5012.
[20]	Mijit Ablimit, Kawahara T, Askar Hamdulla. Lexicon optimization based on discriminative learning for automatic speech recognition of agglutinative language[J]. Speech Communication, 2014, 60:78-87.
[21]	薛化建, 董兴华, 周喜, 等. 基于子字单元的维吾尔语语音识别研究[J]. 计算机工程, 2011, 37(20):208-210.XUE Huajian, DONG Xinghua, ZHOU Xi, et al. Research on Uyghur speech recognition based on subword unit[J]. Computer Engineering, 2011, 37(20):208-210. (in Chinese)
[22]	LI Xin, CAI Shang, PAN Jielin. Large vocabulary Uyghur continuous speech recognition based on stems and suffixes[C]//Chinese Spoken Language Processing (ISCSLP). Tainan, China, 2010:220-223.
[23]	米日古力·阿布都热素, 艾克白尔·帕塔尔, 艾斯卡尔·艾木都拉. 基于电话语料的维吾尔连续音素识别[J]. 通信技术, 2012, 45(7):54-56.Mirigul Abdursul, Akbar Pattar, Askar Hamdulla. Telephone speech corpus-based Uyghur continuous phoneme recognition[J]. Communication Technology, 2012, 45(7):54-56. (in Chinese)
[24]	Povey D, Ghoshal A, Boulianne G, et al. The Kaldi speech recognition toolkit[C]//Proc of ASRU. Waikoloa, HI, USA, 2011.
[25]	YIN Shi, LIU Chao, ZHANG Zhiyong, et al. Noisy training for deep neural networks in speech recognition[J]. EURASIP Journal on Audio, Speech, and Music Processing, 2015, 2015(1):1-14.

[1]	张雪英, 牛溥华, 高帆. 基于DNN-LSTM的VAD算法[J]. 清华大学学报（自然科学版）, 2018, 58(5): 509-515.
[2]	努尔麦麦提·尤鲁瓦斯, 刘俊华, 吾守尔·斯拉木, 热依曼·吐尔逊, 达吾勒·阿布都哈依尔. 跨语言声学模型在维吾尔语语音识别中的应用[J]. 清华大学学报（自然科学版）, 2018, 58(4): 342-346.
[3]	张宇, 张鹏远, 颜永红. 基于注意力LSTM和多任务学习的远场语音识别[J]. 清华大学学报（自然科学版）, 2018, 58(3): 249-253.
[4]	易江燕, 陶建华, 刘斌, 温正棋. 基于迁移学习的噪声鲁棒语音识别声学建模[J]. 清华大学学报（自然科学版）, 2018, 58(1): 55-60.
[5]	傅睿博, 陶建华, 李雅, 温正棋. 基于静音时长和文本特征融合的韵律边界自动标注[J]. 清华大学学报（自然科学版）, 2018, 58(1): 61-66,74.
[6]	王建荣, 高永春, 张句, 魏建国, 党建武. 基于Kinect辅助的机器人带噪语音识别[J]. 清华大学学报（自然科学版）, 2017, 57(9): 921-925.
[7]	哈里旦木·阿布都克里木, 刘洋, 孙茂松. 神经机器翻译系统在维吾尔语-汉语翻译中的性能对比[J]. 清华大学学报（自然科学版）, 2017, 57(8): 878-883.
[8]	阿布都克力木·阿布力孜, 江铭虎, 姚登峰, 哈里旦木·阿布都克里木. 形态复杂词加工的认知神经机制[J]. 清华大学学报（自然科学版）, 2017, 57(4): 393-398.
[9]	米吉提·阿不里米提, 艾克白尔·帕塔尔, 艾斯卡尔·艾木都拉. 基于层次化结构的语言模型单元集优化[J]. 清华大学学报（自然科学版）, 2017, 57(3): 257-263.
[10]	赛牙热·依马木, 热依莱木·帕尔哈提, 艾斯卡尔·艾木都拉, 李志军. 基于不同关键词提取算法的维吾尔文本情感辨识[J]. 清华大学学报（自然科学版）, 2017, 57(3): 270-273.
[11]	张鹏远, 计哲, 侯炜, 金鑫, 韩卫生. 小资源下语音识别算法设计与优化[J]. 清华大学学报（自然科学版）, 2017, 57(2): 147-152.
[12]	王建荣, 张句, 路文焕, 魏建国, 党建武. 机器人自身噪声环境下的自动语音识别[J]. 清华大学学报（自然科学版）, 2017, 57(2): 153-157.
[13]	热合木·马合木提, 于斯音·于苏普, 张家俊, 宗成庆, 艾斯卡尔·艾木都拉. 基于模糊匹配与音字转换的维吾尔语人名识别[J]. 清华大学学报（自然科学版）, 2017, 57(2): 188-196.
[14]	阿不都萨拉木·达吾提, 于斯音·于苏普, 艾斯卡尔·艾木都拉. 类别区分词与情感词典相结合的维吾尔文句子情感分类[J]. 清华大学学报（自然科学版）, 2017, 57(2): 197-201.
[15]	哈妮克孜·伊拉洪, 古力米热·依玛木, 玛依努尔·阿吾力提甫, 姑丽加玛丽·麦麦提艾力, 艾斯卡尔·艾木都拉. 维吾尔语感叹句语调起伏度[J]. 清华大学学报（自然科学版）, 2017, 57(12): 1254-1258.

Viewed

Full text

Abstract

Cited

Shared

Discussed