Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们 横山亮次奖 百年刊庆
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  横山亮次奖  |  百年刊庆
清华大学学报(自然科学版)  2016, Vol. 56 Issue (11): 1220-1225    DOI: 10.16511/j.cnki.qhdxxb.2016.26.015
  计算机科学与技术 本期目录 | 过刊浏览 | 高级检索 |
基于DNN的发音偏误趋势检测
张劲松1,2, 高迎明1, 解焱陆1
1. 北京语言大学 信息科学学院, 北京 100083;
2. 北京语言大学 对外汉语研究中心, 北京 100083
Mispronunciation tendency detection using deep neural networks
ZHANG Jinsong1,2, GAO Yingming1, XIE Yanlu1
1. College of Information Science, Beijing Language and Culture University, Beijing 100083, China;
2. Center for Studies of Chinese as a Second Language, Beijing Language and Culture University, Beijing 100083, China
全文: PDF(1208 KB)  
输出: BibTeX | EndNote (RIS)      
摘要 正音反馈的计算机辅助对外汉语发音训练系统已有发音偏误趋势的标注体系和基于HMM的偏误趋势检测系统。为了进一步提高系统的性能,该文应用深度神经网络进行声学建模,比较Mel频率倒谱系数(Mel-frequency cepstral coefficient,MFCC)、感知线性预测分析系数(perceptual linear predictive analysis,PLP)和Mel滤波器组系数(Mel filter bank,FBank)3种声学特征参数,并利用网格联合技术整合3种声学特征所得的候选网格。实验结果表明:DNN-HMM模型比GMM-HMM实现了更高检测正确率。针对不同发音偏误趋势,3种声学特征有不同表现,联合系统取得最高性能,最终性能为:错误拒绝率5.5%,错误接受率35.6%,检测正确率88.6%。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
张劲松
高迎明
解焱陆
关键词 计算机辅助发音训练发音偏误检测深度神经网络    
Abstract:A previous computer aided pronunciation training (CAPT) system with instructive feedback used mispronunciation tendency labeling in a GMM-HMM based detection system. This system is improved here using a DNN-HMM to model the mispronunciation with comparisons of the effects of three kinds of acoustic features, the mel-frequency cepstral coefficient (MFCC), the perceptual linear predictive analysis (PLP) and the Mel filter bank (FBank). The lattice rescore method is also used with these three features. The results show that the DNN-HMM gives a better detection rate than the conventional approach based on the GMM-HMM. Different features behave differently in capturing the specific mispronunciation tendencies, so the integration of these three features based on the lattice rescore gives the best results with an FRR of 5.5%, FAR of 35.6%, and DA of 88.6%.
Key wordscomputer aided pronunciation training    mispronunciation detection    deep neural network
收稿日期: 2016-06-29      出版日期: 2016-11-15
ZTFLH:  TP391.7  
  H193.2  
引用本文:   
张劲松, 高迎明, 解焱陆. 基于DNN的发音偏误趋势检测[J]. 清华大学学报(自然科学版), 2016, 56(11): 1220-1225.
ZHANG Jinsong, GAO Yingming, XIE Yanlu. Mispronunciation tendency detection using deep neural networks. Journal of Tsinghua University(Science and Technology), 2016, 56(11): 1220-1225.
链接本文:  
http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2016.26.015  或          http://jst.tsinghuajournals.com/CN/Y2016/V56/I11/1220
  表1 面向CAPT汉语中介语语音语料库音段标注规范(部分)
  图1 检测系统框架图
  图2 DNN结构图
  图3 扩展发音网络
  表2 实验语料统计结果
  表3 实验结果分类
  表4 GMM与DNN模型检测结果(%)
  图4 唇形偏误检测性能
  图5 前后化偏误检测性能
  图6 短化偏误检测性能
  图7 舌叶化偏误检测性能
  表5 不同声学特征以及系统联合检测结果(%)
[1] Witt S M. Automatic error detection in pronunciation training:Where we are and where we need to go[C]//Proceedings of the International Symposium on Automatic Detection of Errors in Pronunciation Training (IS ADEPT). Stockholm, Sweden, 2012:1-8.
[2] Zheng J, Huang C, Chu M, et al. Generalized segment posterior probability for automatic Mandarin pronunciation evaluation[C]//The International Conference on Acoustics, Speech and Signal Processing. Hawii, USA:IEEE Press, 2007:201-204.
[3] Hu W, Qian Y, Soong F K. A new DNN-based high quality pronunciation evaluation for computer-aided language learning (CALL)[C]//Proceedings of Conference of International Speech Communication Association. Lyon, France:International Speech Communication Association Press, 2013:1886-1890.
[4] Neri A, Cucchiarini C, Strik H. ASR-based corrective feedback on pronunciation:Does it really work?[C]//Proceedings of Conference of International Speech Communication Association. Pittsburgh PA, USA:International Speech Communication Association Press, 2006:1982-1985.
[5] Harrison A M, Lo W K, Qian X, et al. Implementation of an extended recognition network for mispronunciation detection and diagnosis in computer-assisted pronunciation training[C]//Proceedings of the 2nd ISCA Workshop on Speech and Language Technology in Education. Warrickshire. Brighton, United Kingdom:International Speech Communication Association Press, 2009:45-48.
[6] Cao W, Wang D, Zhang J, et al. Developing a Chinese L2 speech database of Japanese learners with narrow-phonetic labels for computer assisted pronunciation training[C]//Proceedings of Conference of International Speech Communication Association. Chiba, Japan:International Speech Communication Association Press, 2010:1922-1925.
[7] Duan R, Zhang J, Cao W, et al. A Preliminary study on ASR-based detection of Chinese mispronunciation by Japanese learners[C]//Proceedings of Conference of International Speech Communication Association. Singapore:International Speech Communication Association Press, 2014:1478-1481.
[8] Li K, Meng H. Mispronunciation detection and diagnosis in l2 english speech using multi-distribution Deep Neural Networks[C]//Proceedings of the International Symposium on Chinese Spoken Language Processing (ISCSLP). Singapore:IEEE Press, 2014:255-259.
[9] Hu W, Qian Y, Soong F K. A DNN-based acoustic modeling of tonal language and its application to Mandarin pronunciation training[C]//Acoustics, Speech and Signal Processing (ICASSP). Florence, Italy:IEEE Press, 2014:3206-3210.
[10] Qian X, Meng H M, Soong F K. The use of DBN-HMMs for mispronunciation detection and diagnosis in L2 English to support computer-aided pronunciation training[C]//Proceedings of Conference of International Speech Communication Association. Portland, USA:International Speech Communication Association Press, 2012:775-778.
[11] Hu W, Qian Y, Soong F K. A new neural network based logistic regression classifier for improving mispronunciation detection of L2 language learners[C]//Proceedings of the International Symposium on Chinese Spoken Language Processing (ISCSLP). Singapore:IEEE Press, 2014:245-249.
[12] Golik P, Tüske Z, Schlüter R, et al. Development of the RWTH transcription system for Slovenian[C]//Proceedings of Conference of International Speech Communication Association. Lyon, France:International Speech Communication Association Press, 2013:3107-3111.
[13] Zolnay A, Schlüter R, Ney H. Acoustic feature combination for robust speech recognition[C]//The International Conference on Acoustics, Speech and Signal Processing. Philadelpnia, PENN, USA:IEEE Press, 2005:457-460.
[14] Siniscalchi S M, Li J, Lee C H. A study on lattice rescoring with knowledge scores for automatic speech recognition[C]//Proceedings of Conference of International Speech Communication Association. Pittsburgh PA, USA:International Speech Communication Association Press, 2006:517-520.
[15] Yoon S Y, Hasegawa-Johnson M, Sproat R. Landmark-based automated pronunciation error detection[C]//The International Conference on Acoustics, Speech and Signal Processing. Dallas, TX, USA:IEEE Press, 2010:614-617.
[16] Hinton G, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets[J]. Neural Computation, 2006, 18(7):1527-1554.
[17] Luo D, Yang X, Wang L. Improvement of segmental mispronunciation detection with prior knowledge extracted from large L2 speech corpus[C]//Proceedings of Conference of International Speech Communication Association. Florence, Italy:International Speech Communication Association Press, 2011:1593-1596.
[1] 王文广, 陈运文, 蔡华, 曾彦能, 杨慧宇. 基于混合深度神经网络模型的司法文书智能化处理[J]. 清华大学学报(自然科学版), 2019, 59(7): 505-511.
[2] 王晓明, 赵歆波. 基于深度神经网络的个体阅读眼动预测[J]. 清华大学学报(自然科学版), 2019, 59(6): 468-475.
[3] 张雪英, 牛溥华, 高帆. 基于DNN-LSTM的VAD算法[J]. 清华大学学报(自然科学版), 2018, 58(5): 509-515.
[4] 艾斯卡尔·肉孜, 殷实, 张之勇, 王东, 艾斯卡尔·艾木都拉, 郑方. THUYG-20:免费的维吾尔语语音数据库[J]. 清华大学学报(自然科学版), 2017, 57(2): 182-187.
[5] 田垚, 蔡猛, 何亮, 刘加. 基于深度神经网络和Bottleneck特征的说话人识别系统[J]. 清华大学学报(自然科学版), 2016, 56(11): 1143-1148.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 《清华大学学报(自然科学版)》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn