基于音高映射合成语音的汉语双字调声调训练

doi:10.16511/j.cnki.qhdxxb.2017.22.010

摘要
图/表
参考文献
相关文章
Metrics

全文: PDF(1107 KB)
输出: BibTeX | EndNote (RIS)

摘要该文使用音高映射方法，通过选择合适的标准语音，合成出音段、音色保持不变，而只是声调变为标准语音声调的教学语音用于声调训练，减少了语音信号中声调信息之外的复杂变化带来的信息冗余与干扰。以汉语双字调的合成语音为实验材料，对日本被试进行了声调训练实验。训练结果表明：合成语音方法在声调的感知和产出的相对进步率，以及泛化产出的效果上都优于标准语音方法，远好于没有训练的对照组，大部分实验结果差异在统计上具有显著性。实验结果佐证了语音学习时存在人脑的选择性注意机制，为将合成语音方法集成到计算机辅助汉语声调教学系统，提供了实验和理论依据。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS

	作者相关文章
	解焱陆
	张蓓
	张劲松

关键词 ：语音教学, 语音习得, 语音合成, 音高映射, 声调

Abstract：This study uses the pitch projection method to synthesize teaching speech with the appropriate standard voice. The teaching speech is synthesized by turning lexicon tones in the learners' speech into standard tones, while keeping the segments and timbie unchanged. This simplifies the complex variations in the speech signal except for the tones. Then, the system is used for tone training Japanese students based on the synthesized Mandarin two-syllable words. The training results show that this synthesized speech method is superior to a standard voice method with improved perception and production, as well as generalized production. The training results for the synthesized speech method are far better than a control group without training. Most of the results are statistically significant. Tests also show the existence of a selective attention mechanism in the human brain when learning speech. Thus, this study provides an experimental and theoretical basis for speech synthesized methods to be integrated into computer-assisted Mandarin tone learning systems.

Key words： phonetic teaching language learning speech synthesis pitch projection tone

收稿日期: 2016-06-19 出版日期: 2017-02-15

ZTFLH:	H193.2
	TN912.33
	H116.4

引用本文:

解焱陆, 张蓓, 张劲松. 基于音高映射合成语音的汉语双字调声调训练[J]. 清华大学学报（自然科学版）, 2017, 57(2): 170-175.
XIE Yanlu, ZHANG Bei, ZHANG Jinsong. Tone training for Mandarin two-syllable words based on pitch projection synthesized speech. Journal of Tsinghua University(Science and Technology), 2017, 57(2): 170-175.

链接本文:

http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2017.22.010 或 http://jst.tsinghuajournals.com/CN/Y2017/V57/I2/170

图1 声调映射框图

表1 参加各组实验的日本被试基本情况

图2 声调习得实验流程图

图3 3组被试感知前后测的正确率

表2 训练组感知前后测相对进步率与对照组的差异显著性分析

图4 3组被试产出前后测和泛化测试的正确率

表3 前测、后测、泛化测试正确率显著性检验分析

表4 4个声调产出前后测的正确率和差异显著性检验分析

[1]	TANG Min, WANG Chao, Seneff S. Voice transformations:From speech synthesis to mammalian vocalizations[J]. Proc of the Eurospeech, 2002, 18:357-360.
[2]	Probst K, Ke Y, Eskenazi M. Enhancing foreign language tutors:In search of the golden speaker[J]. Speech Communication, 2002, 37(3):161-173.
[3]	Nosofsky R M. Attention and learning processes in the identification and categorization of integral stimuli[J]. Journal of Experimental Psychology:Learning, Memory, and Cognition, 1987, 13(1):87-108.
[4]	Felps D, Bortfeld H, Gutierrez-Osuna R. Foreign accent conversion in computer assisted pronunciation training[J]. Speech Communication, 2009, 51(10):920-932.
[5]	Rodríguez W R, Saz O, Lleida E. A prelingual tool for the education of altered voices[J]. Speech Communication, 2012, 54(5):583-600.
[6]	ZHAO Sixuan, Koh S N, Luke K K. Accent reduction for computer-aided language learning[C]//2012 IEEE Proceedings of the 20th European Signal Processing Conference (EUSIPCO). Bucharest, 2012:335-339.
[7]	XIE Yanlu, ZHANG Jinsong, SHI Shuju. Standard speaker selection in speech synthesis for Mandarin tone learning[C]//Proceedings of the 2012 International Conference on Information Technology and Software Engineering. Heidelberg, 2013:375-383.
[8]	Peabody M, Seneff S. Towards automatic tone correction in non-native Mandarin[C]//International Symposium on Chinese Spoken Language Processing. Singapore, 2006:602-613.
[9]	Martin P. WinPitch LTL Ⅱ, a multimodal pronunciation software[C]//InSTIL/ICALL. Venice, 2004.
[10]	宋益丹. 对外汉语声调教学策略探索[J]. 语言教学与研究, 2009(3):48-53.SONG Yidan. Strategies on teaching tones in Chinese as a foreign language[J]. Language Teaching and Linguistic Studies, 2009(3):48-53. (in Chinese)
[11]	Hussein H, WEI Si, Mixdorff H, et al. Development of a computer-aided language learning system for Mandarin-tone recognition and pronunciation error detection[C]//Proceedings of the Speech Prosody. Chicago, 2010.
[12]	Kawahara H, Masuda-Katsuse I, De Cheveigne A. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F<sub>0</sub> extraction:Possible role of a repetitive structure in sounds[J]. Speech Communication, 1999, 27(3):187-207.
[13]	CHAO Yuen Ren. A Grammar of Spoken Chinese[M]. Berkeley and Los Angeles:University of California Press, 1968.
[14]	薛晶晶. 美国和泰国学习者汉语普通话阳平与上声习得的实验研究[D]. 北京:北京大学, 2013. XUE Jingjing. The Study on Mandarin Tone 2 and Tone 3 by American and Thai Speakers[D]. Beijing:Peking University, 2013. (in Chinese)
[15]	太田裕子.日本学生汉语普通话两字调的发音和感知研究[D]. 北京:北京语言大学, 2011.Ota Yuko. A study of Production and Perception of Tone Sandhi of Chinese Disyllables by Japanese Students[D]. Beijing:Beijing Language and Culture University, 2011. (in Chinese)

[1]	曹冲, 解焱陆, 张劲松. 不同共振峰分布下元音对声调感知的影响[J]. 清华大学学报（自然科学版）, 2018, 58(4): 352-356.
[2]	傅睿博, 陶建华, 李雅, 温正棋. 基于静音时长和文本特征融合的韵律边界自动标注[J]. 清华大学学报（自然科学版）, 2018, 58(1): 61-66,74.
[3]	高莹莹, 朱维彬. 面向情感语音合成的言语情感描述与预测[J]. 清华大学学报（自然科学版）, 2017, 57(2): 202-207.
[4]	古力米热·依玛木, 姑丽加玛丽·麦麦提艾力, 玛依努尔·阿吾力提甫, 艾斯卡尔·艾木都拉. 维吾尔语韵律建模[J]. 清华大学学报（自然科学版）, 2017, 57(12): 1259-1264.
[5]	顾文涛. 母语为粤语和英语的普通话学习者的话语基频偏误特征[J]. 清华大学学报（自然科学版）, 2016, 56(11): 1166-1172.

Viewed

Full text

Abstract

Cited

Shared

Discussed