Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们 横山亮次奖 百年刊庆
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  横山亮次奖  |  百年刊庆
清华大学学报(自然科学版)  2015, Vol. 55 Issue (5): 508-513    
  电子工程 本期目录 | 过刊浏览 | 高级检索 |
基于网格的语音关键词检索算法改进
肖熙, 王竞千
清华大学 电子工程系, 北京 100084
Improved lattice-based speech keyword spotting algorithm
XIAO Xi, WANG Jingqian
Department of Electronic Engineer, Tsinghua University, Beijing 100084, China
全文: PDF(995 KB)  
输出: BibTeX | EndNote (RIS)      
摘要 针对多候选汉语音节网格语音关键词检索任务,在Gauss混合模型以及多候选识别算法方面进行了研究改进。首先探讨了Gauss混合模型的不同简化策略并用实验进行了验证, 证明了全协方差矩阵在识别性能上的优越性; 随后对经典的多候选令牌传递算法做出了针对汉语特点的改进。实验表明这2方面的研究不仅提高了以音节作为输出的语音识别引擎的单候选识别效果, 也大幅提高了多候选的识别性能。最后搭建了一个基于多候选网格的语音关键词检索系统, 在该系统中验证了上述改进的效果。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
肖熙
王竞千
关键词 语音关键词检索多候选网格Gauss混合模型CUDA三音子模型    
Abstract:An improved lattice-based speech keyword spotting system was developed from the Gaussian mixture model and an improved N-best speech recognition algorithm. First, tests were used to evaluate different simplified structures of Gaussian mixture models. Then, an N-best token passing algorithm was developed from the classic token passing algorithm using some unique pronunciation rules for the Chinese language. These two modifications improve the performance of both the 1-best and N-best speech recognition candidates. Finally, a key word spotting system was developed based on an N-best lattice to show the effectiveness of these improvements.
Key wordsspeech keyword spotting    multi-candidate lattice    Gaussian mixture model    compute unified device architecture (CUDA)    triphone model
收稿日期: 2015-03-05      出版日期: 2015-08-04
ZTFLH:  TP391.4  
引用本文:   
肖熙, 王竞千. 基于网格的语音关键词检索算法改进[J]. 清华大学学报(自然科学版), 2015, 55(5): 508-513.
XIAO Xi, WANG Jingqian. Improved lattice-based speech keyword spotting algorithm. Journal of Tsinghua University(Science and Technology), 2015, 55(5): 508-513.
链接本文:  
http://jst.tsinghuajournals.com/CN/  或          http://jst.tsinghuajournals.com/CN/Y2015/V55/I5/508
  图1 三音子语音识别模型示意
  表1 三类GMM 特征总结
  表2 三类GMM 处理1帧语音浮点运算量分析
  表3 三类模型存储空间对比
  表4 三类GMM 首选错误率对比
  图2 跨越字边界的令牌及其前导字网络示意
  图3 传统令牌传递算法声母抑制现象示意
  表5 多候选识别算法改进前后错误率对比
  表6 改进前关键词检索性能
  表7 改进后关键词检索性能
[1] Reynolds D A, Rose R C. Robust text-independent speaker identification using Gaussian mixture speaker models [J]. IEEE Trans Speech Audio Process, 1995, 3(1): 72-83.
[2] Hinton G, Deng L, Yu D, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups [J]. IEEE Signal Process Mag, 2012, 29(6): 82-97.
[3] Yu D, Deng L, Seide F, The deep tensor neural network with applications to large vocabulary speech recognition [J]. IEEE Trans Audio Speech Lang Process, 2013, 21(2): 388-396.
[4] Pang Z, Tu S, Su D, et al. Discriminative training of GMM-HMM acoustic model by RPCL learning [J]. Front Electr Electron Eng China, 2011, 6(2): 283-290.
[5] Povey D, Burget L, Agarwal M, et al. The subspace Gaussian mixture model: A structured model for speech recognition [J]. Comput Speech Lang, 2011, 25(2): 404-439.
[6] Du J, Hu Y, Jiang H. Boosted mixture learning of gaussian mixture hidden markov models based on maximum likelihood for speech recognition [J]. IEEE Trans Audio Speech Lang Process, 2011, 19(7): 2091-2100.
[7] Veiga A, Lopes C, Sá L, et al. Acoustic similarity scores for keyword spotting [J]. Computational Processing of the Portuguese Language, 2014, 8775: 48-58.
[8] Thambiratnam K, Sridharan S. Dynamic match phone-lattice searches for very fast and accurate unrestricted vocabulary keyword spotting [C]//Proc ICASSP. Philadelphia, PA, USA: IEEE Press, 2005: 465-468.
[9] 罗骏, 欧智坚, 王作英. 基于拼音图的两阶段关键词检索系统 [J]. 清华大学学报, 2005, 45(10): 1356-1359.LUO Jun, OU Zhijian WANG Zuoying. Two-stage keyword spotting system based on syllable graphs [J]. Journal of Tsinghua University (Science and Technology), 2005, 45(10): 1356-1359. (in Chinese)
[10] Young S J, Russell N H, Thornton J H S. Token Passing: A Simple Conceptual Model for Connected Speech Recognition Systems, CUED/F-INFENG/TR, 38 [R]. Cambridge, UK: University of Cambridge, 1989.
[11] 李春, 王作英. 基于语音学分类的三音子识别单元的研究 [C]//第六届全国人机语音通讯学术会议论文集. 深圳: 中国中文信息学会, 2001: 257-262.LI Chun, WANG Zuoying. Triphone recognition unit based on phonetics category [C]//The 6th National Conference of Human-Computer Speech Communication. Shenzhen, China: CIPSC, 2001, 257-262. (in Chinese)
[12] 游展. DDBHMM语音识别段长模型的研究和改进 [D]. 北京: 清华大学, 2008.YOU Zhan. The Research and Improvement on DDBHMM Speech Recognition Model [D]. Beijing: Tsinghua University, 2008. (in Chinese)
[13] 肖熙. DDBHMM语音识别模型的训练和识别算法 [D]. 北京: 清华大学, 2003.XIAO Xi. The Training and Recognition Algorithm for DDBHMM Speech Recognition Model [D]. Beijing: Tsinghua University, 2003. (in Chinese).
[1] 杨宏宇, 唐瑞文. 基于电量消耗的Android平台恶意软件检测[J]. 清华大学学报(自然科学版), 2017, 57(1): 44-49.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 《清华大学学报(自然科学版)》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn