基于DNN的发音偏误趋势检测

doi:10.16511/j.cnki.qhdxxb.2016.26.015

摘要
图/表
参考文献
相关文章
Metrics

全文: PDF(1208 KB)
输出: BibTeX | EndNote (RIS)

摘要正音反馈的计算机辅助对外汉语发音训练系统已有发音偏误趋势的标注体系和基于HMM的偏误趋势检测系统。为了进一步提高系统的性能，该文应用深度神经网络进行声学建模，比较Mel频率倒谱系数（Mel-frequency cepstral coefficient，MFCC）、感知线性预测分析系数（perceptual linear predictive analysis，PLP）和Mel滤波器组系数（Mel filter bank，FBank）3种声学特征参数，并利用网格联合技术整合3种声学特征所得的候选网格。实验结果表明：DNN-HMM模型比GMM-HMM实现了更高检测正确率。针对不同发音偏误趋势，3种声学特征有不同表现，联合系统取得最高性能，最终性能为：错误拒绝率5.5%，错误接受率35.6%，检测正确率88.6%。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS

	作者相关文章
	张劲松
	高迎明
	解焱陆

关键词 ：计算机辅助发音训练, 发音偏误检测, 深度神经网络

Abstract：A previous computer aided pronunciation training (CAPT) system with instructive feedback used mispronunciation tendency labeling in a GMM-HMM based detection system. This system is improved here using a DNN-HMM to model the mispronunciation with comparisons of the effects of three kinds of acoustic features, the mel-frequency cepstral coefficient (MFCC), the perceptual linear predictive analysis (PLP) and the Mel filter bank (FBank). The lattice rescore method is also used with these three features. The results show that the DNN-HMM gives a better detection rate than the conventional approach based on the GMM-HMM. Different features behave differently in capturing the specific mispronunciation tendencies, so the integration of these three features based on the lattice rescore gives the best results with an FRR of 5.5%, FAR of 35.6%, and DA of 88.6%.

Key words： computer aided pronunciation training mispronunciation detection deep neural network

收稿日期: 2016-06-29 出版日期: 2016-11-15

ZTFLH:	TP391.7
	H193.2

引用本文:

张劲松, 高迎明, 解焱陆. 基于DNN的发音偏误趋势检测[J]. 清华大学学报（自然科学版）, 2016, 56(11): 1220-1225.
ZHANG Jinsong, GAO Yingming, XIE Yanlu. Mispronunciation tendency detection using deep neural networks. Journal of Tsinghua University(Science and Technology), 2016, 56(11): 1220-1225.

链接本文:

http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2016.26.015 或 http://jst.tsinghuajournals.com/CN/Y2016/V56/I11/1220

表1 面向CAPT汉语中介语语音语料库音段标注规范(部分)

图1 检测系统框架图

图2 DNN结构图

图3 扩展发音网络

表2 实验语料统计结果

表3 实验结果分类

表4 GMM与DNN模型检测结果(％)

图4 唇形偏误检测性能

图5 前后化偏误检测性能

图6 短化偏误检测性能

图7 舌叶化偏误检测性能

表5 不同声学特征以及系统联合检测结果(％)

[1]	Witt S M. Automatic error detection in pronunciation training:Where we are and where we need to go[C]//Proceedings of the International Symposium on Automatic Detection of Errors in Pronunciation Training (IS ADEPT). Stockholm, Sweden, 2012:1-8.
[2]	Zheng J, Huang C, Chu M, et al. Generalized segment posterior probability for automatic Mandarin pronunciation evaluation[C]//The International Conference on Acoustics, Speech and Signal Processing. Hawii, USA:IEEE Press, 2007:201-204.
[3]	Hu W, Qian Y, Soong F K. A new DNN-based high quality pronunciation evaluation for computer-aided language learning (CALL)[C]//Proceedings of Conference of International Speech Communication Association. Lyon, France:International Speech Communication Association Press, 2013:1886-1890.
[4]	Neri A, Cucchiarini C, Strik H. ASR-based corrective feedback on pronunciation:Does it really work?[C]//Proceedings of Conference of International Speech Communication Association. Pittsburgh PA, USA:International Speech Communication Association Press, 2006:1982-1985.
[5]	Harrison A M, Lo W K, Qian X, et al. Implementation of an extended recognition network for mispronunciation detection and diagnosis in computer-assisted pronunciation training[C]//Proceedings of the 2nd ISCA Workshop on Speech and Language Technology in Education. Warrickshire. Brighton, United Kingdom:International Speech Communication Association Press, 2009:45-48.
[6]	Cao W, Wang D, Zhang J, et al. Developing a Chinese L2 speech database of Japanese learners with narrow-phonetic labels for computer assisted pronunciation training[C]//Proceedings of Conference of International Speech Communication Association. Chiba, Japan:International Speech Communication Association Press, 2010:1922-1925.
[7]	Duan R, Zhang J, Cao W, et al. A Preliminary study on ASR-based detection of Chinese mispronunciation by Japanese learners[C]//Proceedings of Conference of International Speech Communication Association. Singapore:International Speech Communication Association Press, 2014:1478-1481.
[8]	Li K, Meng H. Mispronunciation detection and diagnosis in l2 english speech using multi-distribution Deep Neural Networks[C]//Proceedings of the International Symposium on Chinese Spoken Language Processing (ISCSLP). Singapore:IEEE Press, 2014:255-259.
[9]	Hu W, Qian Y, Soong F K. A DNN-based acoustic modeling of tonal language and its application to Mandarin pronunciation training[C]//Acoustics, Speech and Signal Processing (ICASSP). Florence, Italy:IEEE Press, 2014:3206-3210.
[10]	Qian X, Meng H M, Soong F K. The use of DBN-HMMs for mispronunciation detection and diagnosis in L2 English to support computer-aided pronunciation training[C]//Proceedings of Conference of International Speech Communication Association. Portland, USA:International Speech Communication Association Press, 2012:775-778.
[11]	Hu W, Qian Y, Soong F K. A new neural network based logistic regression classifier for improving mispronunciation detection of L2 language learners[C]//Proceedings of the International Symposium on Chinese Spoken Language Processing (ISCSLP). Singapore:IEEE Press, 2014:245-249.
[12]	Golik P, Tüske Z, Schlüter R, et al. Development of the RWTH transcription system for Slovenian[C]//Proceedings of Conference of International Speech Communication Association. Lyon, France:International Speech Communication Association Press, 2013:3107-3111.
[13]	Zolnay A, Schlüter R, Ney H. Acoustic feature combination for robust speech recognition[C]//The International Conference on Acoustics, Speech and Signal Processing. Philadelpnia, PENN, USA:IEEE Press, 2005:457-460.
[14]	Siniscalchi S M, Li J, Lee C H. A study on lattice rescoring with knowledge scores for automatic speech recognition[C]//Proceedings of Conference of International Speech Communication Association. Pittsburgh PA, USA:International Speech Communication Association Press, 2006:517-520.
[15]	Yoon S Y, Hasegawa-Johnson M, Sproat R. Landmark-based automated pronunciation error detection[C]//The International Conference on Acoustics, Speech and Signal Processing. Dallas, TX, USA:IEEE Press, 2010:614-617.
[16]	Hinton G, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets[J]. Neural Computation, 2006, 18(7):1527-1554.
[17]	Luo D, Yang X, Wang L. Improvement of segmental mispronunciation detection with prior knowledge extracted from large L2 speech corpus[C]//Proceedings of Conference of International Speech Communication Association. Florence, Italy:International Speech Communication Association Press, 2011:1593-1596.

[1]	王文广, 陈运文, 蔡华, 曾彦能, 杨慧宇. 基于混合深度神经网络模型的司法文书智能化处理[J]. 清华大学学报（自然科学版）, 2019, 59(7): 505-511.
[2]	王晓明, 赵歆波. 基于深度神经网络的个体阅读眼动预测[J]. 清华大学学报（自然科学版）, 2019, 59(6): 468-475.
[3]	张雪英, 牛溥华, 高帆. 基于DNN-LSTM的VAD算法[J]. 清华大学学报（自然科学版）, 2018, 58(5): 509-515.
[4]	艾斯卡尔·肉孜, 殷实, 张之勇, 王东, 艾斯卡尔·艾木都拉, 郑方. THUYG-20:免费的维吾尔语语音数据库[J]. 清华大学学报（自然科学版）, 2017, 57(2): 182-187.
[5]	田垚, 蔡猛, 何亮, 刘加. 基于深度神经网络和Bottleneck特征的说话人识别系统[J]. 清华大学学报（自然科学版）, 2016, 56(11): 1143-1148.

Viewed

Full text

Abstract

Cited

Shared

Discussed