基于网格的语音关键词检索算法改进

摘要
图/表
参考文献
相关文章
Metrics

全文: PDF(995 KB)
输出: BibTeX | EndNote (RIS)

摘要针对多候选汉语音节网格语音关键词检索任务,在Gauss混合模型以及多候选识别算法方面进行了研究改进。首先探讨了Gauss混合模型的不同简化策略并用实验进行了验证, 证明了全协方差矩阵在识别性能上的优越性; 随后对经典的多候选令牌传递算法做出了针对汉语特点的改进。实验表明这2方面的研究不仅提高了以音节作为输出的语音识别引擎的单候选识别效果, 也大幅提高了多候选的识别性能。最后搭建了一个基于多候选网格的语音关键词检索系统, 在该系统中验证了上述改进的效果。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS

	作者相关文章
	肖熙
	王竞千

关键词 ：语音关键词检索, 多候选网格, Gauss混合模型, CUDA, 三音子模型

Abstract：An improved lattice-based speech keyword spotting system was developed from the Gaussian mixture model and an improved N-best speech recognition algorithm. First, tests were used to evaluate different simplified structures of Gaussian mixture models. Then, an N-best token passing algorithm was developed from the classic token passing algorithm using some unique pronunciation rules for the Chinese language. These two modifications improve the performance of both the 1-best and N-best speech recognition candidates. Finally, a key word spotting system was developed based on an N-best lattice to show the effectiveness of these improvements.

Key words： speech keyword spotting multi-candidate lattice Gaussian mixture model compute unified device architecture (CUDA) triphone model

收稿日期: 2015-03-05 出版日期: 2015-05-15

ZTFLH:

TP391.4

引用本文:

肖熙, 王竞千. 基于网格的语音关键词检索算法改进[J]. 清华大学学报（自然科学版）, 2015, 55(5): 508-513.
XIAO Xi, WANG Jingqian. Improved lattice-based speech keyword spotting algorithm. Journal of Tsinghua University(Science and Technology), 2015, 55(5): 508-513.

链接本文:

http://jst.tsinghuajournals.com/CN/ 或 http://jst.tsinghuajournals.com/CN/Y2015/V55/I5/508

图１　三音子语音识别模型示意

表１　三类GMM 特征总结

表２　三类GMM 处理１帧语音浮点运算量分析

表３　三类模型存储空间对比

表４　三类GMM 首选错误率对比

图２　跨越字边界的令牌及其前导字网络示意

图３　传统令牌传递算法声母抑制现象示意

表５　多候选识别算法改进前后错误率对比

表６　改进前关键词检索性能

表7　改进后关键词检索性能

[1]	Reynolds D A, Rose R C. Robust text-independent speaker identification using Gaussian mixture speaker models [J]. IEEE Trans Speech Audio Process, 1995, 3(1): 72-83.
[2]	Hinton G, Deng L, Yu D, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups [J]. IEEE Signal Process Mag, 2012, 29(6): 82-97.
[3]	Yu D, Deng L, Seide F, The deep tensor neural network with applications to large vocabulary speech recognition [J]. IEEE Trans Audio Speech Lang Process, 2013, 21(2): 388-396.
[4]	Pang Z, Tu S, Su D, et al. Discriminative training of GMM-HMM acoustic model by RPCL learning [J]. Front Electr Electron Eng China, 2011, 6(2): 283-290.
[5]	Povey D, Burget L, Agarwal M, et al. The subspace Gaussian mixture model: A structured model for speech recognition [J]. Comput Speech Lang, 2011, 25(2): 404-439.
[6]	Du J, Hu Y, Jiang H. Boosted mixture learning of gaussian mixture hidden markov models based on maximum likelihood for speech recognition [J]. IEEE Trans Audio Speech Lang Process, 2011, 19(7): 2091-2100.
[7]	Veiga A, Lopes C, Sá L, et al. Acoustic similarity scores for keyword spotting [J]. Computational Processing of the Portuguese Language, 2014, 8775: 48-58.
[8]	Thambiratnam K, Sridharan S. Dynamic match phone-lattice searches for very fast and accurate unrestricted vocabulary keyword spotting [C]//Proc ICASSP. Philadelphia, PA, USA: IEEE Press, 2005: 465-468.
[9]	罗骏, 欧智坚, 王作英. 基于拼音图的两阶段关键词检索系统 [J]. 清华大学学报, 2005, 45(10): 1356-1359.LUO Jun, OU Zhijian WANG Zuoying. Two-stage keyword spotting system based on syllable graphs [J]. Journal of Tsinghua University (Science and Technology), 2005, 45(10): 1356-1359. (in Chinese)
[10]	Young S J, Russell N H, Thornton J H S. Token Passing: A Simple Conceptual Model for Connected Speech Recognition Systems, CUED/F-INFENG/TR, 38 [R]. Cambridge, UK: University of Cambridge, 1989.
[11]	李春, 王作英. 基于语音学分类的三音子识别单元的研究 [C]//第六届全国人机语音通讯学术会议论文集. 深圳: 中国中文信息学会, 2001: 257-262.LI Chun, WANG Zuoying. Triphone recognition unit based on phonetics category [C]//The 6th National Conference of Human-Computer Speech Communication. Shenzhen, China: CIPSC, 2001, 257-262. (in Chinese)
[12]	游展. DDBHMM语音识别段长模型的研究和改进 [D]. 北京: 清华大学, 2008.YOU Zhan. The Research and Improvement on DDBHMM Speech Recognition Model [D]. Beijing: Tsinghua University, 2008. (in Chinese)
[13]	肖熙. DDBHMM语音识别模型的训练和识别算法 [D]. 北京: 清华大学, 2003.XIAO Xi. The Training and Recognition Algorithm for DDBHMM Speech Recognition Model [D]. Beijing: Tsinghua University, 2003. (in Chinese).

[1]	杨宏宇, 唐瑞文. 基于电量消耗的Android平台恶意软件检测[J]. 清华大学学报（自然科学版）, 2017, 57(1): 44-49.

Viewed

Full text

Abstract

Cited

Shared

Discussed