Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们 横山亮次奖 百年刊庆
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  横山亮次奖  |  百年刊庆
清华大学学报(自然科学版)  2016, Vol. 56 Issue (11): 1179-1183    DOI: 10.16511/j.cnki.qhdxxb.2016.26.008
  电子工程 本期目录 | 过刊浏览 | 高级检索 |
基于特征迁移学习方法的跨库语音情感识别
宋鹏1, 郑文明2, 赵力2
1. 烟台大学 计算机与控制工程学院, 烟台 264005;
2. 东南大学 儿童发展与学习科学教育部重点实验室, 南京 210096
Cross-corpus speech emotion recognition based on a feature transfer learning method
SONG Peng1, ZHENG Wenming2, ZHAO Li2
1. School of Computer and Control Engineering, Yantai University, Yantai 264005, China;
2. Key Laboratory of Child Development and Learning Science of Ministry of Education, Southeast University, Nanjing 210096, China
全文: PDF(1081 KB)  
输出: BibTeX | EndNote (RIS)      
摘要 在实际语音情感识别系统中,训练语音和测试语音往往来自不同的语料库,识别率下降显著。针对这一问题,该文提出一种有效的基于特征迁移学习的跨库语音情感识别方法。引入最大均值差异(maximum mean discrepancy,MMD)来描述不同数据库情感特征分布之间的相似度,并通过最大均值差异嵌入(maximum mean discrepancy embedding,MMDE)算法及特征降维算法来寻找二者之间的邻近低维特征空间,并在此低维空间中训练得到情感分类器用于情感识别。同时为了更好地保证情感信息的类别区分度,进一步引入半监督判别分析(semi-supervised discriminant analysis,SDA)方法用于特征降维。最后在2个经典语音情感数据库上对提出的方法进行实验评价,实验结果表明:提出的方法可以有效提高跨库条件下的语音情感识别率。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
宋鹏
郑文明
赵力
关键词 语音情感识别迁移学习特征降维半监督判别分析    
Abstract:Speech emotion recognition systems offen use training data and testing data from different corpora, so the recognition rates decrease drastically. This paper presents a feature transfer learning method for cross-corpora speech emotion recognition. The maximum mean discrepancy (MMD) is used to describe the similarities between the emotional feature distributions of the different corpora, then the latent close low dimensional feature space is obtained via the maximum mean discrepancy embedding (MMDE) and dimension reduction algorithms, with the classifiers then trained in this space for emotion recognition. A semi-supervised discriminative analysis (SDA) algorithm is further used for dimension reduction to better ensure the class discrimination of the emotional features. Tests on two popular speech emotion datasets demonstrate that this method efficiently improves the recognition rates for cross-corpora speech emotion recognition.
Key wordsspeech emotion recognition    transfer learning    feature dimension reduction    semi-supervised discriminative analysis
收稿日期: 2016-06-19      出版日期: 2016-11-26
ZTFLH:  TN912.3  
引用本文:   
宋鹏, 郑文明, 赵力. 基于特征迁移学习方法的跨库语音情感识别[J]. 清华大学学报(自然科学版), 2016, 56(11): 1179-1183.
SONG Peng, ZHENG Wenming, ZHAO Li. Cross-corpus speech emotion recognition based on a feature transfer learning method. Journal of Tsinghua University(Science and Technology), 2016, 56(11): 1179-1183.
链接本文:  
http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2016.26.008  或          http://jst.tsinghuajournals.com/CN/Y2016/V56/I11/1179
  图1 基于特征迁移学习的语音情感识别框架
  表1 实验采用的低层描述符
  表2 不同方案下的情感识别率比较
  图3 方案2得到的情感混淆矩阵
[1] 韩文静, 李海峰, 阮华斌, 等. 语音情感识别研究进展综述[J]. 软件学报, 2014, 25(1):37-50.HAN Wenjing, LI Haifeng, RUAN Huabin, et al. Review on speech emotion recognition[J]. Journal of Software, 2014, 25(1):37-50. (in Chinese)
[2] Chen L, Mao X, Xue Y, et al. Speech emotion recognition:features and classification models[J]. Digital Signal Processing, 2012, 22(6):1154-1160.
[3] Mao Q, Dong M, Huang Z, et al. Learning salient features for speech emotion recognition using convolutional neural networks[J]. IEEE Transactions on Multimedia, 2014, 16(8):2203-2213.
[4] Schuller B, Vlasenko B, Eyben F, et al. Cross-corpus acoustic emotion recognition:Variances and strategies[J]. IEEE Transactions on Affective Computing, 2010, 1(2):119-131.
[5] Jeon J H, Le D, Xia R, et al. A preliminary study of cross-lingual emotion recognition from speech:Automatic classification versus human perception[C]//Proceedings of Interspeech. Lyon, France:ISCA, 2013:2837-2840.
[6] Deng J, Zhang Z, Eyben F, et al. Autoencoder-based unsupervised domain adaptation for speech emotion recognition[J]. IEEE Signal Processing Letters, 2014, 21(9):1068-1072.
[7] Song P, Jin Y, Zha C, et al. Speech emotion recognition method using hidden factor analysis[J]. Electronics Letters, 2015, 51(1):112-114.
[8] 庄福振, 罗平, 何清, 等. 迁移学习研究进展[J]. 软件学报, 2015, 26(1):26-39.ZHUANG Fuzhen, LUO Ping, HE Qing, et al. Survey on transfer learning research[J]. Journal of Software, 2015, 26(1):26-39. (in Chinese)
[9] Gretton A, Borgwardt K M, Rasch M, et al. A kernel method for the two-sample-problem[C]//Proceedings of Advances in Neural Information Processing Systems. Vancouver, Canada:MIT Press, 2006:513-520.
[10] Cai D, He X, Han J. Semi-supervised discriminant analysis[C]//Proceedings of the 11th International Conference on Computer Vision. Chicago, USA:IEEE Press, 2007:1-7.
[11] Burkhardt F, Paeschke A, Rolfes M, et al. A database of German emotional speech[C]//Proceedings of Interspeech. Lisbon, Portugal:ISCA, 2005:1517-1520.
[12] Martin O, Kotsia I, Macq B, et al. The eNTERFACE'05 audio-visual emotion database[C]//Proceedings of International Conference on Data Engineering Workshops. Atlanta, USA:IEEE Press, 2006:8.
[13] Zheng W, Xin M, Wang X, et al. A novel speech emotion recognition method via incomplete sparse least square regression[J]. IEEE Signal Processing Letters, 2014, 21(5):569-572.
[1] 郭武, 马啸空. 复杂噪声场景下的活动语音检测方法[J]. 清华大学学报(自然科学版), 2016, 56(11): 1190-1195.
[2] 梁维谦, 郑方, 郑佳春, 朴志刚. 一种改善言语清晰度的子带自适应降噪算法[J]. 清华大学学报(自然科学版), 2016, 56(11): 1173-1178.
[3] 邢安昊, 张鹏远, 潘接林, 颜永红. 基于SVD的DNN裁剪方法和重训练[J]. 清华大学学报(自然科学版), 2016, 56(7): 772-776.
[4] 李海峰, 房春英, 马琳, 张满彩, 孙佳音. 病理语音的S变换特征[J]. 清华大学学报(自然科学版), 2016, 56(7): 765-771.
[5] 徐敬德, 崔慧娟, 唐昆. 结合信源和信道的多级矢量量化联合优化算法[J]. 清华大学学报(自然科学版), 2015, 55(8): 826-830.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 《清华大学学报(自然科学版)》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn