Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  百年期刊
Journal of Tsinghua University(Science and Technology)    2017, Vol. 57 Issue (2) : 202-207     DOI: 10.16511/j.cnki.qhdxxb.2017.22.015
INFORMATION ENGINEERING |
Describing and predicting affective messages for expressive speech synthesis
GAO Yingying, ZHU Weibin
Institute of Information Science, Beijing Jiaotong University, Beijing 100044, China
Download: PDF(1161 KB)  
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks    
Abstract  A multi-perspective emotion model is presented to provide more details about the emotions in expressive speech synthesis and to facilitate automatic predictions. The method describes the emotion development in terms of the cognitive appraisal, psychological feeling, physical response and utterance manner. The descriptive model is used to develop a text-based emotion prediction model using a deep neural network (the deep stacking network), which supports distributed representation and has a stacking structure. Tests validate the benefits of using this prediction method for the interactions among different emotional aspects and the contextual impacts, as well as the effectiveness of the deep stacking network and the multi-perspective emotion model.
Keywords speech synthesis      emotion description      text-based emotion prediction      deep neural network     
ZTFLH:  TN912.33  
  TP391.1  
  TP183  
Issue Date: 15 February 2017
Service
E-mail this article
E-mail Alert
RSS
Articles by authors
GAO Yingying
ZHU Weibin
Cite this article:   
GAO Yingying,ZHU Weibin. Describing and predicting affective messages for expressive speech synthesis[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(2): 202-207.
URL:  
http://jst.tsinghuajournals.com/EN/10.16511/j.cnki.qhdxxb.2017.22.015     OR     http://jst.tsinghuajournals.com/EN/Y2017/V57/I2/202
  
  
  
  
  
  
  
[1] Govind D, Prasanna S R M. Expressive speech synthesis:A review[J]. International Journal of Speech Technology, 2013, 16(2):237-260.
[2] 徐俊, 蔡莲红. 面向情感转换的层次化韵律分析与建模[J]. 清华大学学报:自然科学版, 2009, 49(S1):1274-1277.XU Jun, CAI Lianhong. Hierarchical prosody analysis and modeling for emotional conversions[J]. J Tsinghua Univ:Sci & Tech, 2009, 49(S1):1274-1277. (in Chinese)
url: http://dx.doi.org/nghua Univ:Sci
[3] TAO Jianhua, KANG Yongguo, LI Aijun. Prosody conversion from neutral speech to emotional speech[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2006, 14(4):1145-1154.
[4] 韩纪庆, 邵艳秋. 基于语音信号的情感处理研究进展[J]. 电声技术, 2006(5):58-62.HAN Jiqing, SHAO Yanqiu. Research progress of emotion processing based on speech signal[J]. Audio Engineering, 2006(5):58-62. (in Chinese)
[5] Ekman P, Friesen W V, O'Sullivan M, et al. Universals and cultural differences in the judgments of facial s of emotion[J]. Journal of Personality and Social Psychology, 1987, 53(4):712-717.
[6] Cowie R, Douglas-Cowie E, Savvidou S, et al. FEELTRACE:An instrument for recording perceived emotion in real time[C]//ISCA Tutorial and Research Workshop (ITRW) on Speech and Emotion. Newcastle, UK, 2000:19-24.
[7] Mehrabian A. Pleasure-arousal-dominance:A general framework for describing and measuring individual differences in temperament[J]. Current Psychology, 1996, 14(4):261-292.
[8] Moors A, Ellsworth P C, Scherer K R, et al. Appraisal theories of emotion:State of the art and future development[J]. Emotion Review, 2013, 5(2):119-124.
[9] 高莹莹, 朱维彬. 言语情感描述体系的试验性研究[J]. 中国语音学报, 2013, 4:71-81.GAO Yingying, ZHU Weibin. The research for the description system of speech emotion[J]. Chinese Journal of Phonetics, 2013, 4:71-81. (in Chinese)
[10] DENG Li, YU Dong, Platt J. Scalable stacking and learning for building deep architectures[C]//IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Kyoto, Japan, 2012:2133-2136.
[11] Riedl M, Biemann C. Text segmentation with topic models[J]. Journal for Language Technology and Computational Linguistics, 2012, 27(1):47-69.
[12] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation[J]. The Journal of Machine Learning Research, 2003, 3:993-1022.
[13] Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786):504-507.
[14] Hinton G. A practical guide to training restricted Boltzmann machines[J]. Momentum, 2010, 9(1):599-619.
[15] YU Dong, DENG Li. Accelerated parallelizable neural network learning algorithm for speech recognition[C]//12th Annual Conference of the International Speech Communication Association (INTERSPEECH). Florence, Italy:ISCA Press, 2011:2281-2284.
[1] WANG Wenguan, CHEN Yunwen, CAI Hua, ZENG Yanneng, YANG Huiyu. Judicial document intellectual processing using hybrid deep neural networks[J]. Journal of Tsinghua University(Science and Technology), 2019, 59(7): 505-511.
[2] WANG Xiaoming, ZHAO Xinbo. Eye movement prediction of individuals while reading based on deep neural networks[J]. Journal of Tsinghua University(Science and Technology), 2019, 59(6): 468-475.
[3] ZHANG Xueying, NIU Puhua, GAO Fan. DNN-LSTM based VAD algorithm[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(5): 509-515.
[4] YI Jiangyan, TAO Jianhua, LIU Bin, WEN Zhengqi. Transfer learning for acoustic modeling of noise robust speech recognition[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(1): 55-60.
[5] FU Ruibo, TAO Jianhua, LI Ya, WEN Zhengqi. Automatic prosodic boundary labeling based on fusing the silence duration with the lexical features[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(1): 61-66,74.
[6] XIE Yanlu, ZHANG Bei, ZHANG Jinsong. Tone training for Mandarin two-syllable words based on pitch projection synthesized speech[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(2): 170-175.
[7] Aisikaer Rouzi, YIN Shi, ZHANG Zhiyong, WANG Dong, Askar Hamdulla, ZHENG Fang. THUYG-20: A free Uyghur speech database[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(2): 182-187.
[8] XING Anhao, ZHANG Pengyuan, PAN Jielin, YAN Yonghong. SVD-based DNN pruning and retraining[J]. Journal of Tsinghua University(Science and Technology), 2016, 56(7): 772-776.
[9] TIAN Yao, CAI Meng, HE Liang, LIU Jia. Speaker recognition system based on deep neural networks and bottleneck features[J]. Journal of Tsinghua University(Science and Technology), 2016, 56(11): 1143-1148.
[10] ZHANG Jinsong, GAO Yingming, XIE Yanlu. Mispronunciation tendency detection using deep neural networks[J]. Journal of Tsinghua University(Science and Technology), 2016, 56(11): 1220-1225.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
Copyright © Journal of Tsinghua University(Science and Technology), All Rights Reserved.
Powered by Beijing Magtech Co. Ltd