可穿戴设备和智能家居系统需要语音识别引擎占用极小的资源并具有较强的拒识能力。传统的语音识别算法无法满足小资源系统的这种需求。该文针对小资源下语音识别系统,在解码策略和拒识算法设计上均提出了改进方法。在解码策略上,通过修改垃圾音素的重入,使得集外语音的拒识率提高到64.8%,而内存占用只增加了8.5 kB。在拒识算法上,提出了离线计算背景概率和在线查表的方法,与基线系统相比,在集内识别率略有损失的情况下,集外拒识率达到93.8%,而内存占用和计算速度也得到了优化。
Wearable devices and smart home systems need speech recognition engines with few resources and high rejection rates. Traditional methods cannot provide such systems. This paper presents algorithms for decoding and rejection for a low source speech recognition system. The decoding improves the rejection rate up to 64.8% by changing the filler reentry while the memory is only increased 8.5 kB compared with the baseline system. The rejection algorithm computes a background probability which is compared to similar probabilities calculated in advance online decoding. The system gives a rejection rate of 93.8% with little loss in the recognition rate. The memory and computational speed are also optimized.
[1] 韩娜, 钟卓成, 吴振权, 等. 基于体感控制的智能家居系统设计与实现[J]. 信息技术, 2015(12):91-93.HAN Na, ZHONG Zhuocheng, WU Zhenquan, et al. Design and implementation of smart home system based on somatosensory control[J]. Information Technology, 2015(12):91-93. (in Chinese) [2] 叶高扬, 毕冉. 基于物联网的智能家居系统设计与实现[J]. 计算机应用, 2014(S1):318-319.YE Gaoyang, BI Ran. Design and implementation of smart home system based on Internet of things[J]. Journal of Computer Applications, 2014(S1):318-319. (in Chinese) [3] Joshi V, Bilgi R, Umesh S, et al. Sub-band based histogram equalization in cepstral domain for speech recognition[J]. Speech Communication, 2015, 69:46-65. [4] 王智国. 嵌入式人机语音交互系统关键技术研究[D]. 合肥:中国科学技术大学, 2014.WANG Zhiguo. Research on Key Technologies of Embedded Human-Machine Speech Interaction System[D]. Hefei:University of Science and Technology of China, 2014. (in Chinese) [5] 邵健, 韩疆, 颜永红. 嵌入式语音识别中一种高效的搜索树构造方法[C]//第8届全国人机语音通讯学术会议. 北京, 2005.SHAO Jian, HAN Jiang, YAN Yonghong. An efficient search algorithm in embed speech recognition[C]//The Eighth National Conference on Man-Machine Speech Communication. Beijing, China, 2005. (in Chinese) [6] Jiang H. Confidence measures for speech recognition:A survey[J]. Speech Communication, 2005, 45(4):455-470. [7] Sanchez-Cortina I, Andrés-Ferrer J, Sanchis A, et al. Speaker-adapted confidence measures for speech recognition of video lectures[J]. Computer Speech & Language, 2016, 37:11-23. [8] Young S R. Detecting misrecognitions and out-of-vocabulary words[C]//Acoustics, Speech, and Signal Processing. Adelaide, SA, Australia, 1994, 2:21-24. [9] Wessel F, Schluter R, Macherey K, et al. Confidence measures for large vocabulary continuous speech recognition[J]. IEEE Transactions on Speech and Audio Processing, 2001, 9(3):288-298. [10] Yoma N B, Carrasco J, Molina C. Bayes-based confidence measure in speech recognition[J]. IEEE Signal Processing Letters, 2005, 12(11):745-748. [11] Sherif A, Scordilis M S. Beam search pruning in speech recognition using a posterior probability-based confidence measure[J]. Speech Communication, 2003, 42:409-428. [12] Sanchis A, Juan A, Vidal E. A word-based naïve Bayes classifier for confidence estimation in speech recognition[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(2):565-574.