清华大学学报(自然科学版)  2016, Vol. 56 Issue (11): 1220-1225    DOI: 10.16511/j.cnki.qhdxxb.2016.26.015
张劲松1,2, 高迎明1, 解焱陆1
1. 北京语言大学 信息科学学院, 北京 100083;
2. 北京语言大学 对外汉语研究中心, 北京 100083
Mispronunciation tendency detection using deep neural networks
ZHANG Jinsong1,2, GAO Yingming1, XIE Yanlu1
1. College of Information Science, Beijing Language and Culture University, Beijing 100083, China;
2. Center for Studies of Chinese as a Second Language, Beijing Language and Culture University, Beijing 100083, China
摘要 正音反馈的计算机辅助对外汉语发音训练系统已有发音偏误趋势的标注体系和基于HMM的偏误趋势检测系统。为了进一步提高系统的性能,该文应用深度神经网络进行声学建模,比较Mel频率倒谱系数(Mel-frequency cepstral coefficient,MFCC)、感知线性预测分析系数(perceptual linear predictive analysis,PLP)和Mel滤波器组系数(Mel filter bank,FBank)3种声学特征参数,并利用网格联合技术整合3种声学特征所得的候选网格。实验结果表明:DNN-HMM模型比GMM-HMM实现了更高检测正确率。针对不同发音偏误趋势,3种声学特征有不同表现,联合系统取得最高性能,最终性能为:错误拒绝率5.5%,错误接受率35.6%,检测正确率88.6%。
关键词 计算机辅助发音训练发音偏误检测深度神经网络    
Abstract:A previous computer aided pronunciation training (CAPT) system with instructive feedback used mispronunciation tendency labeling in a GMM-HMM based detection system. This system is improved here using a DNN-HMM to model the mispronunciation with comparisons of the effects of three kinds of acoustic features, the mel-frequency cepstral coefficient (MFCC), the perceptual linear predictive analysis (PLP) and the Mel filter bank (FBank). The lattice rescore method is also used with these three features. The results show that the DNN-HMM gives a better detection rate than the conventional approach based on the GMM-HMM. Different features behave differently in capturing the specific mispronunciation tendencies, so the integration of these three features based on the lattice rescore gives the best results with an FRR of 5.5%, FAR of 35.6%, and DA of 88.6%.
Key wordscomputer aided pronunciation training    mispronunciation detection    deep neural network
收稿日期: 2016-06-29      出版日期: 2016-11-15
ZTFLH:  TP391.7  
张劲松, 高迎明, 解焱陆. 基于DNN的发音偏误趋势检测[J]. 清华大学学报(自然科学版), 2016, 56(11): 1220-1225.
ZHANG Jinsong, GAO Yingming, XIE Yanlu. Mispronunciation tendency detection using deep neural networks. Journal of Tsinghua University(Science and Technology), 2016, 56(11): 1220-1225.
  表1 面向CAPT汉语中介语语音语料库音段标注规范(部分)
  图1 检测系统框架图
  图2 DNN结构图
  图3 扩展发音网络
  表2 实验语料统计结果
  表3 实验结果分类
  表4 GMM与DNN模型检测结果(%)
  图4 唇形偏误检测性能
  图5 前后化偏误检测性能
  图6 短化偏误检测性能
  图7 舌叶化偏误检测性能
  表5 不同声学特征以及系统联合检测结果(%)
