Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们 横山亮次奖 百年刊庆
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  横山亮次奖  |  百年刊庆
清华大学学报(自然科学版)  2018, Vol. 58 Issue (4): 347-351    DOI: 10.16511/j.cnki.qhdxxb.2018.26.014
  计算机科学与技术 本期目录 | 过刊浏览 | 高级检索 |
基于子空间学习和特征选择融合的语音情感识别
宋鹏1, 郑文明2, 赵力2
1. 烟台大学 计算机与控制工程学院, 烟台 264005;
2. 东南大学 儿童发展与学习科学教育部重点实验室, 南京 210096
Joint subspace learning and feature selection method for speech emotion recognition
SONG Peng1, ZHENG Wenming2, ZHAO Li2
1. School of Computer and Control Engineering, Yantai University, Yantai 264005, China;
2. Key Laboratory of Child Development and Learning Science of Ministry of Education, Southeast University, Nanjing 210096, China
全文: PDF(1045 KB)  
输出: BibTeX | EndNote (RIS)      
摘要 传统语音情感识别主要基于单一情感数据库进行训练与测试。而实际情况中,训练语句和测试语句往往来源于不同的数据库,识别率较低。为此,该文提出一种基于子空间学习和特征选择融合的语音情感识别方法。通过采用回归方法来学习特征的子空间表示;同时,引入l2,1-范数用于特征的选择和最大均值差异(maximum mean discrepancy,MMD)来减少不同情感数据库间的特征差异,进行联合优化求解从而提取较为鲁棒的情感特征表示。在EMO-DB和eNTERFACE这2个公开情感数据库上进行实验评价,结果表明:该方法在跨库条件下具有较好的性能,比其他经典的迁移学习方法更加鲁棒高效。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
宋鹏
郑文明
赵力
关键词 特征选择子空间学习情感识别    
Abstract:Traditional speech emotion recognition methods are trained and evaluated on a single corpus. However, when the training and testing use different corpora, the recognition performance drops drastically. A joint subspace learning and feature selection method is presented here to imprive recognition. In this method, the feature subspace is learned via a regression algorithm with the l2,1-norm used for feature selection. The maximum mean discrepancy (MMD) is then used to measure the feature divergence between different corpora. Tests show this algorithm gives satisfactory results for cross-corpus speech emotion recognition and is more robust and efficient than state-of-the-art transfer learning methods.
Key wordsfeature selection    subspace learning    emotion recognition
收稿日期: 2017-09-21      出版日期: 2018-04-15
ZTFLH:  TN912.3  
基金资助:国家自然科学基金资助项目(61703360,61602399);山东省自然科学基金资助项目(ZR2014FQ016,ZR2016FB22,ZR2017QF006);东南大学基本科研业务费资助项目(CDLS-2017-02)
作者简介: 宋鹏(1983-),男,讲师。E-mail:pengsongseu@gmail.com
引用本文:   
宋鹏, 郑文明, 赵力. 基于子空间学习和特征选择融合的语音情感识别[J]. 清华大学学报(自然科学版), 2018, 58(4): 347-351.
SONG Peng, ZHENG Wenming, ZHAO Li. Joint subspace learning and feature selection method for speech emotion recognition. Journal of Tsinghua University(Science and Technology), 2018, 58(4): 347-351.
链接本文:  
http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2018.26.014  或          http://jst.tsinghuajournals.com/CN/Y2018/V58/I4/347
  图1 本文提出的语音情感识别框架
  表1 方案1和2的情感识别结果
  图2 不同情感在方案1下的情感识别率
  图3 不同情感在方案2下的情感识别率
[1] 韩文静, 李海峰, 阮华斌, 等. 语音情感识别研究进展综述[J]. 软件学报, 2014, 25(1):37-50. HAN W J, LI H F, RUAN H B, et al. Review on speech emotion recognition[J]. Journal of Software, 2014, 25(1):37-50. (in Chinese).
[2] HAN K, YU D, TASHEV I. Speech emotion recognition using deep neural network and extreme learning machine[C]//Proceedings of the 15th Annual Conference of the International Speech Communication Association. Singapore:ISCA, 2014:223-227.
[3] KINNUNEN T, LI H Z. An overview of text-independent speaker recognition:From features to supervectors[J]. Speech Communication, 2010, 52(1):12-40.
[4] HU H, XU M X, WU W. GMM supervector based SVM with spectral features for speech emotion recognition[C]//Proceedings of 2007 International Conference on Acoustics, Speech and Signal Processing (ICASSP). Honolulu, USA:IEEE, 2007:413-416.
[5] El AYADI M, KAMEL M S, KARRAY F. Survey on speech emotion recognition:Features, classification schemes, and databases[J]. Pattern Recognition, 2011, 44(3):572-587.
[6] WEISS K, KHOSHGOFTAAR T M, WANG D D. A survey of transfer learning[J]. Journal of Big Data, 2016, 3(1):1-40.
[7] DENG J, ZHANG Z X, EYBEN F, et al. Autoencoder-based unsupervised domain adaptation for speech emotion recognition[J]. IEEE Signal Processing Letters, 2014, 21(9):1068-1072.
[8] ABDELWAHAB M, BUSSO C. Supervised domain adaptation for emotion recognition from speech[C]//Proceedings of 2015 International Conference on Acoustics, Speech and Signal Processing (ICASSP). Brisbane, Australia:IEEE, 2015:5058-5062.
[9] HASSAN A, DAMPER R, NIRANJAN M. On acoustic emotion recognition:Compensating for covariate shift[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(7):1458-1468.
[10] SONG P, ZHENG W M, LIANG R Y. Speech emotion recognition based on sparse transfer learning method[J]. IEICE Transactions on Information and Systems, 2015, 98(7):1409-1412.
[11] SONG P, ZHENG W M, OU S F, et al. Cross-corpus speech emotion recognition based on transfer non-negative matrix factorization[J]. Speech Communication, 2016, 83:34-41.
[12] YAN S C, XU D, ZHANG B Y, et al. Graph embedding and extensions:A general framework for dimensionality reduction[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(1):40-51.
[13] NIE F P, HUANG H, CAI X, et al. Efficient and robustfeature selection via joint l2,1-norms minimization[C]//Proceedings of the 24th Annual Conference on Neural Information Processing Systems (NIPS). Vancouver, Canada:NIPS, 2010:1813-1821.
[14] HE R, TAN T N, WANG L, et al. l2,1 regularized correntropy for robust feature selection[C]//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Providence, USA:IEEE, 2012:2504-2511.
[15] BURKHARDT F, PAESCHKE A, ROLFES M, et al. A database of German emotional speech[C]//Proceedings of INTERSPEECH. Lisbon, Portugal:ISCA, 2005:1517-1520.
[16] MARTIN O, KOTSIA I, MACQ B, et al. The eNTERFACE'05 audio-visual emotion database[C]//Proceedings of the 22nd International Conference on Data Engineering Workshops. Atlanta, USA:IEEE, 2006:8-8.
[17] EYBEN F, WÖLLMER M, SCHULLER B. Opensmile:The munich versatile and fast open-source audio feature extractor[C]//Proceedings of the 18th ACM International Conference on Multimedia. Firenze, Italy:ACM, 2010:1459-1462.
[18] SCHULLER B, STEIDL S, BATLINER A, et al. The INTERSPEECH 2010 paralinguistic challenge[C]//Proceeding of the 11th Annual Conference of the International Speech Communication Association. Makuhari, Japan:ISCA, 2010:2795-2798.
[1] 赛牙热·依马木, 热依莱木·帕尔哈提, 艾斯卡尔·艾木都拉, 李志军. 基于不同关键词提取算法的维吾尔文本情感辨识[J]. 清华大学学报(自然科学版), 2017, 57(3): 270-273.
[2] 宋鹏, 郑文明, 赵力. 基于特征迁移学习方法的跨库语音情感识别[J]. 清华大学学报(自然科学版), 2016, 56(11): 1179-1183.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 《清华大学学报(自然科学版)》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn