Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  百年期刊
Journal of Tsinghua University(Science and Technology)    2016, Vol. 56 Issue (11) : 1179-1183     DOI: 10.16511/j.cnki.qhdxxb.2016.26.008
ELECTRONIC ENGINEERING |
Cross-corpus speech emotion recognition based on a feature transfer learning method
SONG Peng1, ZHENG Wenming2, ZHAO Li2
1. School of Computer and Control Engineering, Yantai University, Yantai 264005, China;
2. Key Laboratory of Child Development and Learning Science of Ministry of Education, Southeast University, Nanjing 210096, China
Download: PDF(1081 KB)  
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks    
Abstract  Speech emotion recognition systems offen use training data and testing data from different corpora, so the recognition rates decrease drastically. This paper presents a feature transfer learning method for cross-corpora speech emotion recognition. The maximum mean discrepancy (MMD) is used to describe the similarities between the emotional feature distributions of the different corpora, then the latent close low dimensional feature space is obtained via the maximum mean discrepancy embedding (MMDE) and dimension reduction algorithms, with the classifiers then trained in this space for emotion recognition. A semi-supervised discriminative analysis (SDA) algorithm is further used for dimension reduction to better ensure the class discrimination of the emotional features. Tests on two popular speech emotion datasets demonstrate that this method efficiently improves the recognition rates for cross-corpora speech emotion recognition.
Keywords speech emotion recognition      transfer learning      feature dimension reduction      semi-supervised discriminative analysis     
ZTFLH:  TN912.3  
Issue Date: 15 November 2016
Service
E-mail this article
E-mail Alert
RSS
Articles by authors
SONG Peng
ZHENG Wenming
ZHAO Li
Cite this article:   
SONG Peng,ZHENG Wenming,ZHAO Li. Cross-corpus speech emotion recognition based on a feature transfer learning method[J]. Journal of Tsinghua University(Science and Technology), 2016, 56(11): 1179-1183.
URL:  
http://jst.tsinghuajournals.com/EN/10.16511/j.cnki.qhdxxb.2016.26.008     OR     http://jst.tsinghuajournals.com/EN/Y2016/V56/I11/1179
  
  
  
  
[1] 韩文静, 李海峰, 阮华斌, 等. 语音情感识别研究进展综述[J]. 软件学报, 2014, 25(1):37-50.HAN Wenjing, LI Haifeng, RUAN Huabin, et al. Review on speech emotion recognition[J]. Journal of Software, 2014, 25(1):37-50. (in Chinese)
[2] Chen L, Mao X, Xue Y, et al. Speech emotion recognition:features and classification models[J]. Digital Signal Processing, 2012, 22(6):1154-1160.
[3] Mao Q, Dong M, Huang Z, et al. Learning salient features for speech emotion recognition using convolutional neural networks[J]. IEEE Transactions on Multimedia, 2014, 16(8):2203-2213.
[4] Schuller B, Vlasenko B, Eyben F, et al. Cross-corpus acoustic emotion recognition:Variances and strategies[J]. IEEE Transactions on Affective Computing, 2010, 1(2):119-131.
[5] Jeon J H, Le D, Xia R, et al. A preliminary study of cross-lingual emotion recognition from speech:Automatic classification versus human perception[C]//Proceedings of Interspeech. Lyon, France:ISCA, 2013:2837-2840.
[6] Deng J, Zhang Z, Eyben F, et al. Autoencoder-based unsupervised domain adaptation for speech emotion recognition[J]. IEEE Signal Processing Letters, 2014, 21(9):1068-1072.
[7] Song P, Jin Y, Zha C, et al. Speech emotion recognition method using hidden factor analysis[J]. Electronics Letters, 2015, 51(1):112-114.
[8] 庄福振, 罗平, 何清, 等. 迁移学习研究进展[J]. 软件学报, 2015, 26(1):26-39.ZHUANG Fuzhen, LUO Ping, HE Qing, et al. Survey on transfer learning research[J]. Journal of Software, 2015, 26(1):26-39. (in Chinese)
[9] Gretton A, Borgwardt K M, Rasch M, et al. A kernel method for the two-sample-problem[C]//Proceedings of Advances in Neural Information Processing Systems. Vancouver, Canada:MIT Press, 2006:513-520.
[10] Cai D, He X, Han J. Semi-supervised discriminant analysis[C]//Proceedings of the 11th International Conference on Computer Vision. Chicago, USA:IEEE Press, 2007:1-7.
[11] Burkhardt F, Paeschke A, Rolfes M, et al. A database of German emotional speech[C]//Proceedings of Interspeech. Lisbon, Portugal:ISCA, 2005:1517-1520.
[12] Martin O, Kotsia I, Macq B, et al. The eNTERFACE'05 audio-visual emotion database[C]//Proceedings of International Conference on Data Engineering Workshops. Atlanta, USA:IEEE Press, 2006:8.
[13] Zheng W, Xin M, Wang X, et al. A novel speech emotion recognition method via incomplete sparse least square regression[J]. IEEE Signal Processing Letters, 2014, 21(5):569-572.
[1] WANG Yun, HU Min, TA Na, SUN Haitao, GUO Yifeng, ZHOU Wuai, GUO Yu, ZHANG Wanzhe, FENG Jianhua. Large language models and their application in government affairs[J]. Journal of Tsinghua University(Science and Technology), 2024, 64(4): 649-658.
[2] ZHAO Chuanjun, WU Meiling, SHEN Lihua, SHANGGUAN Xuekui, WANG Yanjie, LI Jie, WANG Suge, LI Deyu. Cross-domain sentiment classification based on syntactic structure transfer and domain fusion[J]. Journal of Tsinghua University(Science and Technology), 2023, 63(9): 1380-1389.
[3] YI Jiangyan, TAO Jianhua, LIU Bin, WEN Zhengqi. Transfer learning for acoustic modeling of noise robust speech recognition[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(1): 55-60.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
Copyright © Journal of Tsinghua University(Science and Technology), All Rights Reserved.
Powered by Beijing Magtech Co. Ltd