Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们 横山亮次奖 百年刊庆
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  横山亮次奖  |  百年刊庆
清华大学学报(自然科学版)  2017, Vol. 57 Issue (2): 147-152    DOI: 10.16511/j.cnki.qhdxxb.2017.22.006
  信息工程 本期目录 | 过刊浏览 | 高级检索 |
小资源下语音识别算法设计与优化
张鹏远1, 计哲2, 侯炜2, 金鑫2, 韩卫生1
1. 中国科学院 声学研究所, 语言声学与内容理解重点实验室, 北京 100190;
2. 国家计算机网络应急技术处理协调中心, 北京 100029
Design and optimization of a low resource speech recognition system
ZHANG Pengyuan1, JI Zhe2, HOU Wei2, JIN Xin2, HAN Weisheng1
1. Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China;
2. National Computer Network Emergency Response Technical Team/Coordination Center of China, Beijing 100029, China
全文: PDF(1225 KB)  
输出: BibTeX | EndNote (RIS)      
摘要 可穿戴设备和智能家居系统需要语音识别引擎占用极小的资源并具有较强的拒识能力。传统的语音识别算法无法满足小资源系统的这种需求。该文针对小资源下语音识别系统,在解码策略和拒识算法设计上均提出了改进方法。在解码策略上,通过修改垃圾音素的重入,使得集外语音的拒识率提高到64.8%,而内存占用只增加了8.5 kB。在拒识算法上,提出了离线计算背景概率和在线查表的方法,与基线系统相比,在集内识别率略有损失的情况下,集外拒识率达到93.8%,而内存占用和计算速度也得到了优化。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
张鹏远
计哲
侯炜
金鑫
韩卫生
关键词 语音识别小资源置信度    
Abstract:Wearable devices and smart home systems need speech recognition engines with few resources and high rejection rates. Traditional methods cannot provide such systems. This paper presents algorithms for decoding and rejection for a low source speech recognition system. The decoding improves the rejection rate up to 64.8% by changing the filler reentry while the memory is only increased 8.5 kB compared with the baseline system. The rejection algorithm computes a background probability which is compared to similar probabilities calculated in advance online decoding. The system gives a rejection rate of 93.8% with little loss in the recognition rate. The memory and computational speed are also optimized.
Key wordsspeech recognition    low resource    confidence measure
收稿日期: 2016-06-29      出版日期: 2017-02-15
ZTFLH:  TN912.34  
引用本文:   
张鹏远, 计哲, 侯炜, 金鑫, 韩卫生. 小资源下语音识别算法设计与优化[J]. 清华大学学报(自然科学版), 2017, 57(2): 147-152.
ZHANG Pengyuan, JI Zhe, HOU Wei, JIN Xin, HAN Weisheng. Design and optimization of a low resource speech recognition system. Journal of Tsinghua University(Science and Technology), 2017, 57(2): 147-152.
链接本文:  
http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2017.22.006  或          http://jst.tsinghuajournals.com/CN/Y2017/V57/I2/147
  图1 命令词网络
  图2 垃圾音素网络
  图3 垃圾音素的拒识算法流程图
  图4 垃圾音素的重入示例
  图5 在线置信度计算流程图
  表1 不同垃圾音素处理策略的性能对比
  表2 不同置信度策略的性能对比
[1] 韩娜, 钟卓成, 吴振权, 等. 基于体感控制的智能家居系统设计与实现[J]. 信息技术, 2015(12):91-93.HAN Na, ZHONG Zhuocheng, WU Zhenquan, et al. Design and implementation of smart home system based on somatosensory control[J]. Information Technology, 2015(12):91-93. (in Chinese)
[2] 叶高扬, 毕冉. 基于物联网的智能家居系统设计与实现[J]. 计算机应用, 2014(S1):318-319.YE Gaoyang, BI Ran. Design and implementation of smart home system based on Internet of things[J]. Journal of Computer Applications, 2014(S1):318-319. (in Chinese)
[3] Joshi V, Bilgi R, Umesh S, et al. Sub-band based histogram equalization in cepstral domain for speech recognition[J]. Speech Communication, 2015, 69:46-65.
[4] 王智国. 嵌入式人机语音交互系统关键技术研究[D]. 合肥:中国科学技术大学, 2014.WANG Zhiguo. Research on Key Technologies of Embedded Human-Machine Speech Interaction System[D]. Hefei:University of Science and Technology of China, 2014. (in Chinese)
[5] 邵健, 韩疆, 颜永红. 嵌入式语音识别中一种高效的搜索树构造方法[C]//第8届全国人机语音通讯学术会议. 北京, 2005.SHAO Jian, HAN Jiang, YAN Yonghong. An efficient search algorithm in embed speech recognition[C]//The Eighth National Conference on Man-Machine Speech Communication. Beijing, China, 2005. (in Chinese)
[6] Jiang H. Confidence measures for speech recognition:A survey[J]. Speech Communication, 2005, 45(4):455-470.
[7] Sanchez-Cortina I, Andrés-Ferrer J, Sanchis A, et al. Speaker-adapted confidence measures for speech recognition of video lectures[J]. Computer Speech & Language, 2016, 37:11-23.
[8] Young S R. Detecting misrecognitions and out-of-vocabulary words[C]//Acoustics, Speech, and Signal Processing. Adelaide, SA, Australia, 1994, 2:21-24.
[9] Wessel F, Schluter R, Macherey K, et al. Confidence measures for large vocabulary continuous speech recognition[J]. IEEE Transactions on Speech and Audio Processing, 2001, 9(3):288-298.
[10] Yoma N B, Carrasco J, Molina C. Bayes-based confidence measure in speech recognition[J]. IEEE Signal Processing Letters, 2005, 12(11):745-748.
[11] Sherif A, Scordilis M S. Beam search pruning in speech recognition using a posterior probability-based confidence measure[J]. Speech Communication, 2003, 42:409-428.
[12] Sanchis A, Juan A, Vidal E. A word-based naïve Bayes classifier for confidence estimation in speech recognition[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(2):565-574.
[1] 张宇, 张鹏远, 颜永红. 基于注意力LSTM和多任务学习的远场语音识别[J]. 清华大学学报(自然科学版), 2018, 58(3): 249-253.
[2] 易江燕, 陶建华, 刘斌, 温正棋. 基于迁移学习的噪声鲁棒语音识别声学建模[J]. 清华大学学报(自然科学版), 2018, 58(1): 55-60.
[3] 王建荣, 高永春, 张句, 魏建国, 党建武. 基于Kinect辅助的机器人带噪语音识别[J]. 清华大学学报(自然科学版), 2017, 57(9): 921-925.
[4] 米吉提·阿不里米提, 艾克白尔·帕塔尔, 艾斯卡尔·艾木都拉. 基于层次化结构的语言模型单元集优化[J]. 清华大学学报(自然科学版), 2017, 57(3): 257-263.
[5] 王建荣, 张句, 路文焕, 魏建国, 党建武. 机器人自身噪声环境下的自动语音识别[J]. 清华大学学报(自然科学版), 2017, 57(2): 153-157.
[6] 艾斯卡尔·肉孜, 殷实, 张之勇, 王东, 艾斯卡尔·艾木都拉, 郑方. THUYG-20:免费的维吾尔语语音数据库[J]. 清华大学学报(自然科学版), 2017, 57(2): 182-187.
[7] 邢安昊, 张鹏远, 潘接林, 颜永红. 基于SVD的DNN裁剪方法和重训练[J]. 清华大学学报(自然科学版), 2016, 56(7): 772-776.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 《清华大学学报(自然科学版)》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn