小资源下语音识别算法设计与优化

张鹏远; 计哲; 侯炜; 金鑫; 韩卫生

doi:10.16511/j.cnki.qhdxxb.2017.22.006

清华大学学报（自然科学版） >

2017 , Vol. 57 >Issue 2: 147 - 152

DOI: https://doi.org/10.16511/j.cnki.qhdxxb.2017.22.006

信息工程

小资源下语音识别算法设计与优化

张鹏远 ,
计哲 ,
侯炜 ,
金鑫 ,
韩卫生

展开

1. 中国科学院声学研究所, 语言声学与内容理解重点实验室, 北京 100190;
2. 国家计算机网络应急技术处理协调中心, 北京 100029

收稿日期: 2016-06-29

网络出版日期: 2017-02-15

收起

Design and optimization of a low resource speech recognition system

ZHANG Pengyuan ,
JI Zhe ,
HOU Wei ,
JIN Xin ,
HAN Weisheng

Expand

1. Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China;
2. National Computer Network Emergency Response Technical Team/Coordination Center of China, Beijing 100029, China

Received date: 2016-06-29

Online published: 2017-02-15

Fold

摘要

可穿戴设备和智能家居系统需要语音识别引擎占用极小的资源并具有较强的拒识能力。传统的语音识别算法无法满足小资源系统的这种需求。该文针对小资源下语音识别系统，在解码策略和拒识算法设计上均提出了改进方法。在解码策略上，通过修改垃圾音素的重入，使得集外语音的拒识率提高到64.8%，而内存占用只增加了8.5 kB。在拒识算法上，提出了离线计算背景概率和在线查表的方法，与基线系统相比，在集内识别率略有损失的情况下，集外拒识率达到93.8%，而内存占用和计算速度也得到了优化。

关键词： 语音识别; 小资源; 置信度

本文引用格式

张鹏远 , 计哲 , 侯炜 , 金鑫 , 韩卫生 . 小资源下语音识别算法设计与优化[J]. 清华大学学报（自然科学版）, 2017 , 57(2) : 147 -152 . DOI: 10.16511/j.cnki.qhdxxb.2017.22.006

Abstract

Wearable devices and smart home systems need speech recognition engines with few resources and high rejection rates. Traditional methods cannot provide such systems. This paper presents algorithms for decoding and rejection for a low source speech recognition system. The decoding improves the rejection rate up to 64.8% by changing the filler reentry while the memory is only increased 8.5 kB compared with the baseline system. The rejection algorithm computes a background probability which is compared to similar probabilities calculated in advance online decoding. The system gives a rejection rate of 93.8% with little loss in the recognition rate. The memory and computational speed are also optimized.

Key words： speech recognition; low resource; confidence measure

参考文献

[1] 韩娜, 钟卓成, 吴振权, 等. 基于体感控制的智能家居系统设计与实现[J]. 信息技术, 2015(12):91-93.HAN Na, ZHONG Zhuocheng, WU Zhenquan, et al. Design and implementation of smart home system based on somatosensory control[J]. Information Technology, 2015(12):91-93. (in Chinese) [2] 叶高扬, 毕冉. 基于物联网的智能家居系统设计与实现[J]. 计算机应用, 2014(S1):318-319.YE Gaoyang, BI Ran. Design and implementation of smart home system based on Internet of things[J]. Journal of Computer Applications, 2014(S1):318-319. (in Chinese) [3] Joshi V, Bilgi R, Umesh S, et al. Sub-band based histogram equalization in cepstral domain for speech recognition[J]. Speech Communication, 2015, 69:46-65. [4] 王智国. 嵌入式人机语音交互系统关键技术研究[D]. 合肥:中国科学技术大学, 2014.WANG Zhiguo. Research on Key Technologies of Embedded Human-Machine Speech Interaction System[D]. Hefei:University of Science and Technology of China, 2014. (in Chinese) [5] 邵健, 韩疆, 颜永红. 嵌入式语音识别中一种高效的搜索树构造方法[C]//第8届全国人机语音通讯学术会议. 北京, 2005.SHAO Jian, HAN Jiang, YAN Yonghong. An efficient search algorithm in embed speech recognition[C]//The Eighth National Conference on Man-Machine Speech Communication. Beijing, China, 2005. (in Chinese) [6] Jiang H. Confidence measures for speech recognition:A survey[J]. Speech Communication, 2005, 45(4):455-470. [7] Sanchez-Cortina I, Andrés-Ferrer J, Sanchis A, et al. Speaker-adapted confidence measures for speech recognition of video lectures[J]. Computer Speech & Language, 2016, 37:11-23. [8] Young S R. Detecting misrecognitions and out-of-vocabulary words[C]//Acoustics, Speech, and Signal Processing. Adelaide, SA, Australia, 1994, 2:21-24. [9] Wessel F, Schluter R, Macherey K, et al. Confidence measures for large vocabulary continuous speech recognition[J]. IEEE Transactions on Speech and Audio Processing, 2001, 9(3):288-298. [10] Yoma N B, Carrasco J, Molina C. Bayes-based confidence measure in speech recognition[J]. IEEE Signal Processing Letters, 2005, 12(11):745-748. [11] Sherif A, Scordilis M S. Beam search pruning in speech recognition using a posterior probability-based confidence measure[J]. Speech Communication, 2003, 42:409-428. [12] Sanchis A, Juan A, Vidal E. A word-based naïve Bayes classifier for confidence estimation in speech recognition[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(2):565-574.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献

访问统计