小资源下语音识别算法设计与优化

doi:10.16511/j.cnki.qhdxxb.2017.22.006

摘要
图/表
参考文献
相关文章
Metrics

全文: PDF(1225 KB)
输出: BibTeX | EndNote (RIS)

摘要可穿戴设备和智能家居系统需要语音识别引擎占用极小的资源并具有较强的拒识能力。传统的语音识别算法无法满足小资源系统的这种需求。该文针对小资源下语音识别系统，在解码策略和拒识算法设计上均提出了改进方法。在解码策略上，通过修改垃圾音素的重入，使得集外语音的拒识率提高到64.8%，而内存占用只增加了8.5 kB。在拒识算法上，提出了离线计算背景概率和在线查表的方法，与基线系统相比，在集内识别率略有损失的情况下，集外拒识率达到93.8%，而内存占用和计算速度也得到了优化。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS

	作者相关文章
	张鹏远
	计哲
	侯炜
	金鑫
	韩卫生

关键词 ：语音识别, 小资源, 置信度

Abstract：Wearable devices and smart home systems need speech recognition engines with few resources and high rejection rates. Traditional methods cannot provide such systems. This paper presents algorithms for decoding and rejection for a low source speech recognition system. The decoding improves the rejection rate up to 64.8% by changing the filler reentry while the memory is only increased 8.5 kB compared with the baseline system. The rejection algorithm computes a background probability which is compared to similar probabilities calculated in advance online decoding. The system gives a rejection rate of 93.8% with little loss in the recognition rate. The memory and computational speed are also optimized.

Key words： speech recognition low resource confidence measure

收稿日期: 2016-06-29 出版日期: 2017-02-15

ZTFLH:

TN912.34

引用本文:

张鹏远, 计哲, 侯炜, 金鑫, 韩卫生. 小资源下语音识别算法设计与优化[J]. 清华大学学报（自然科学版）, 2017, 57(2): 147-152.
ZHANG Pengyuan, JI Zhe, HOU Wei, JIN Xin, HAN Weisheng. Design and optimization of a low resource speech recognition system. Journal of Tsinghua University(Science and Technology), 2017, 57(2): 147-152.

链接本文:

http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2017.22.006 或 http://jst.tsinghuajournals.com/CN/Y2017/V57/I2/147

图1 命令词网络

图2 垃圾音素网络

图3 垃圾音素的拒识算法流程图

图4 垃圾音素的重入示例

图5 在线置信度计算流程图

表1 不同垃圾音素处理策略的性能对比

表2 不同置信度策略的性能对比

[1]	韩娜, 钟卓成, 吴振权, 等. 基于体感控制的智能家居系统设计与实现[J]. 信息技术, 2015(12):91-93.HAN Na, ZHONG Zhuocheng, WU Zhenquan, et al. Design and implementation of smart home system based on somatosensory control[J]. Information Technology, 2015(12):91-93. (in Chinese)
[2]	叶高扬, 毕冉. 基于物联网的智能家居系统设计与实现[J]. 计算机应用, 2014(S1):318-319.YE Gaoyang, BI Ran. Design and implementation of smart home system based on Internet of things[J]. Journal of Computer Applications, 2014(S1):318-319. (in Chinese)
[3]	Joshi V, Bilgi R, Umesh S, et al. Sub-band based histogram equalization in cepstral domain for speech recognition[J]. Speech Communication, 2015, 69:46-65.
[4]	王智国. 嵌入式人机语音交互系统关键技术研究[D]. 合肥:中国科学技术大学, 2014.WANG Zhiguo. Research on Key Technologies of Embedded Human-Machine Speech Interaction System[D]. Hefei:University of Science and Technology of China, 2014. (in Chinese)
[5]	邵健, 韩疆, 颜永红. 嵌入式语音识别中一种高效的搜索树构造方法[C]//第8届全国人机语音通讯学术会议. 北京, 2005.SHAO Jian, HAN Jiang, YAN Yonghong. An efficient search algorithm in embed speech recognition[C]//The Eighth National Conference on Man-Machine Speech Communication. Beijing, China, 2005. (in Chinese)
[6]	Jiang H. Confidence measures for speech recognition:A survey[J]. Speech Communication, 2005, 45(4):455-470.
[7]	Sanchez-Cortina I, Andrés-Ferrer J, Sanchis A, et al. Speaker-adapted confidence measures for speech recognition of video lectures[J]. Computer Speech & Language, 2016, 37:11-23.
[8]	Young S R. Detecting misrecognitions and out-of-vocabulary words[C]//Acoustics, Speech, and Signal Processing. Adelaide, SA, Australia, 1994, 2:21-24.
[9]	Wessel F, Schluter R, Macherey K, et al. Confidence measures for large vocabulary continuous speech recognition[J]. IEEE Transactions on Speech and Audio Processing, 2001, 9(3):288-298.
[10]	Yoma N B, Carrasco J, Molina C. Bayes-based confidence measure in speech recognition[J]. IEEE Signal Processing Letters, 2005, 12(11):745-748.
[11]	Sherif A, Scordilis M S. Beam search pruning in speech recognition using a posterior probability-based confidence measure[J]. Speech Communication, 2003, 42:409-428.
[12]	Sanchis A, Juan A, Vidal E. A word-based naïve Bayes classifier for confidence estimation in speech recognition[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(2):565-574.

[1]	张宇, 张鹏远, 颜永红. 基于注意力LSTM和多任务学习的远场语音识别[J]. 清华大学学报（自然科学版）, 2018, 58(3): 249-253.
[2]	易江燕, 陶建华, 刘斌, 温正棋. 基于迁移学习的噪声鲁棒语音识别声学建模[J]. 清华大学学报（自然科学版）, 2018, 58(1): 55-60.
[3]	王建荣, 高永春, 张句, 魏建国, 党建武. 基于Kinect辅助的机器人带噪语音识别[J]. 清华大学学报（自然科学版）, 2017, 57(9): 921-925.
[4]	米吉提·阿不里米提, 艾克白尔·帕塔尔, 艾斯卡尔·艾木都拉. 基于层次化结构的语言模型单元集优化[J]. 清华大学学报（自然科学版）, 2017, 57(3): 257-263.
[5]	王建荣, 张句, 路文焕, 魏建国, 党建武. 机器人自身噪声环境下的自动语音识别[J]. 清华大学学报（自然科学版）, 2017, 57(2): 153-157.
[6]	艾斯卡尔·肉孜, 殷实, 张之勇, 王东, 艾斯卡尔·艾木都拉, 郑方. THUYG-20:免费的维吾尔语语音数据库[J]. 清华大学学报（自然科学版）, 2017, 57(2): 182-187.
[7]	邢安昊, 张鹏远, 潘接林, 颜永红. 基于SVD的DNN裁剪方法和重训练[J]. 清华大学学报（自然科学版）, 2016, 56(7): 772-776.

Viewed

Full text

Abstract

Cited

Shared

Discussed