Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们 横山亮次奖 百年刊庆
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  横山亮次奖  |  百年刊庆
清华大学学报(自然科学版)  2017, Vol. 57 Issue (2): 202-207    DOI: 10.16511/j.cnki.qhdxxb.2017.22.015
  信息工程 本期目录 | 过刊浏览 | 高级检索 |
面向情感语音合成的言语情感描述与预测
高莹莹, 朱维彬
北京交通大学 信息科学研究所, 北京 100044
Describing and predicting affective messages for expressive speech synthesis
GAO Yingying, ZHU Weibin
Institute of Information Science, Beijing Jiaotong University, Beijing 100044, China
全文: PDF(1161 KB)  
输出: BibTeX | EndNote (RIS)      
摘要 针对情感语音合成系统中情感的细腻刻画与自动预测问题,提出多视角情感描述模型,从认知评价、心理感受、生理反应和发音方式4个方面刻画言语情感的产生过程和衍化机制;引入能够支持分布式特征且具有堆叠结构的多层神经网络——深层堆叠网络构建从文本到情感描述的预测模型。实验结果表明在预测模型中引入不同情感成分和上下文信息作为特征有助于提升预测效果,验证了采用深层堆叠网络进行情感预测的有效性与多视角情感描述模型的合理性。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
高莹莹
朱维彬
关键词 语音合成情感描述文本情感预测深层神经网络    
Abstract:A multi-perspective emotion model is presented to provide more details about the emotions in expressive speech synthesis and to facilitate automatic predictions. The method describes the emotion development in terms of the cognitive appraisal, psychological feeling, physical response and utterance manner. The descriptive model is used to develop a text-based emotion prediction model using a deep neural network (the deep stacking network), which supports distributed representation and has a stacking structure. Tests validate the benefits of using this prediction method for the interactions among different emotional aspects and the contextual impacts, as well as the effectiveness of the deep stacking network and the multi-perspective emotion model.
Key wordsspeech synthesis    emotion description    text-based emotion prediction    deep neural network
收稿日期: 2016-06-29      出版日期: 2017-02-21
ZTFLH:  TN912.33  
  TP391.1  
  TP183  
通讯作者: 朱维彬,副教授,E-mail:wbzhu@bjtu.edu.cn     E-mail: wbzhu@bjtu.edu.cn
引用本文:   
高莹莹, 朱维彬. 面向情感语音合成的言语情感描述与预测[J]. 清华大学学报(自然科学版), 2017, 57(2): 202-207.
GAO Yingying, ZHU Weibin. Describing and predicting affective messages for expressive speech synthesis. Journal of Tsinghua University(Science and Technology), 2017, 57(2): 202-207.
链接本文:  
http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2017.22.015  或          http://jst.tsinghuajournals.com/CN/Y2017/V57/I2/202
  图1 言语情感产生过程示意图[9]
  图2 多视角情感描述模型[9]
  图3 多尺度文本情感预测模型
  图4 深层堆叠网络模块结构和连接关系示意图
  表1 加入不同情感成分的预测结果
  表2 加入篇章级和段落级情感信息的情感预测结果
  表3 加入前一句情感信息的情感预测结果
[1] Govind D, Prasanna S R M. Expressive speech synthesis:A review[J]. International Journal of Speech Technology, 2013, 16(2):237-260.
[2] 徐俊, 蔡莲红. 面向情感转换的层次化韵律分析与建模[J]. 清华大学学报:自然科学版, 2009, 49(S1):1274-1277.XU Jun, CAI Lianhong. Hierarchical prosody analysis and modeling for emotional conversions[J]. J Tsinghua Univ:Sci & Tech, 2009, 49(S1):1274-1277. (in Chinese)
[3] TAO Jianhua, KANG Yongguo, LI Aijun. Prosody conversion from neutral speech to emotional speech[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2006, 14(4):1145-1154.
[4] 韩纪庆, 邵艳秋. 基于语音信号的情感处理研究进展[J]. 电声技术, 2006(5):58-62.HAN Jiqing, SHAO Yanqiu. Research progress of emotion processing based on speech signal[J]. Audio Engineering, 2006(5):58-62. (in Chinese)
[5] Ekman P, Friesen W V, O'Sullivan M, et al. Universals and cultural differences in the judgments of facial s of emotion[J]. Journal of Personality and Social Psychology, 1987, 53(4):712-717.
[6] Cowie R, Douglas-Cowie E, Savvidou S, et al. FEELTRACE:An instrument for recording perceived emotion in real time[C]//ISCA Tutorial and Research Workshop (ITRW) on Speech and Emotion. Newcastle, UK, 2000:19-24.
[7] Mehrabian A. Pleasure-arousal-dominance:A general framework for describing and measuring individual differences in temperament[J]. Current Psychology, 1996, 14(4):261-292.
[8] Moors A, Ellsworth P C, Scherer K R, et al. Appraisal theories of emotion:State of the art and future development[J]. Emotion Review, 2013, 5(2):119-124.
[9] 高莹莹, 朱维彬. 言语情感描述体系的试验性研究[J]. 中国语音学报, 2013, 4:71-81.GAO Yingying, ZHU Weibin. The research for the description system of speech emotion[J]. Chinese Journal of Phonetics, 2013, 4:71-81. (in Chinese)
[10] DENG Li, YU Dong, Platt J. Scalable stacking and learning for building deep architectures[C]//IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Kyoto, Japan, 2012:2133-2136.
[11] Riedl M, Biemann C. Text segmentation with topic models[J]. Journal for Language Technology and Computational Linguistics, 2012, 27(1):47-69.
[12] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation[J]. The Journal of Machine Learning Research, 2003, 3:993-1022.
[13] Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786):504-507.
[14] Hinton G. A practical guide to training restricted Boltzmann machines[J]. Momentum, 2010, 9(1):599-619.
[15] YU Dong, DENG Li. Accelerated parallelizable neural network learning algorithm for speech recognition[C]//12th Annual Conference of the International Speech Communication Association (INTERSPEECH). Florence, Italy:ISCA Press, 2011:2281-2284.
[1] 解焱陆, 张蓓, 张劲松. 基于音高映射合成语音的汉语双字调声调训练[J]. 清华大学学报(自然科学版), 2017, 57(2): 170-175.
[2] 邢安昊, 张鹏远, 潘接林, 颜永红. 基于SVD的DNN裁剪方法和重训练[J]. 清华大学学报(自然科学版), 2016, 56(7): 772-776.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 《清华大学学报(自然科学版)》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn