COMPUTER SCIENCE AND TECHNOLOGY |
|
|
|
|
|
Joint subspace learning and feature selection method for speech emotion recognition |
SONG Peng1, ZHENG Wenming2, ZHAO Li2 |
1. School of Computer and Control Engineering, Yantai University, Yantai 264005, China; 2. Key Laboratory of Child Development and Learning Science of Ministry of Education, Southeast University, Nanjing 210096, China |
|
|
Abstract Traditional speech emotion recognition methods are trained and evaluated on a single corpus. However, when the training and testing use different corpora, the recognition performance drops drastically. A joint subspace learning and feature selection method is presented here to imprive recognition. In this method, the feature subspace is learned via a regression algorithm with the l2,1-norm used for feature selection. The maximum mean discrepancy (MMD) is then used to measure the feature divergence between different corpora. Tests show this algorithm gives satisfactory results for cross-corpus speech emotion recognition and is more robust and efficient than state-of-the-art transfer learning methods.
|
Keywords
feature selection
subspace learning
emotion recognition
|
|
Issue Date: 15 April 2018
|
|
|
[1] 韩文静, 李海峰, 阮华斌, 等. 语音情感识别研究进展综述[J]. 软件学报, 2014, 25(1):37-50. HAN W J, LI H F, RUAN H B, et al. Review on speech emotion recognition[J]. Journal of Software, 2014, 25(1):37-50. (in Chinese). [2] HAN K, YU D, TASHEV I. Speech emotion recognition using deep neural network and extreme learning machine[C]//Proceedings of the 15th Annual Conference of the International Speech Communication Association. Singapore:ISCA, 2014:223-227. [3] KINNUNEN T, LI H Z. An overview of text-independent speaker recognition:From features to supervectors[J]. Speech Communication, 2010, 52(1):12-40. [4] HU H, XU M X, WU W. GMM supervector based SVM with spectral features for speech emotion recognition[C]//Proceedings of 2007 International Conference on Acoustics, Speech and Signal Processing (ICASSP). Honolulu, USA:IEEE, 2007:413-416. [5] El AYADI M, KAMEL M S, KARRAY F. Survey on speech emotion recognition:Features, classification schemes, and databases[J]. Pattern Recognition, 2011, 44(3):572-587. [6] WEISS K, KHOSHGOFTAAR T M, WANG D D. A survey of transfer learning[J]. Journal of Big Data, 2016, 3(1):1-40. [7] DENG J, ZHANG Z X, EYBEN F, et al. Autoencoder-based unsupervised domain adaptation for speech emotion recognition[J]. IEEE Signal Processing Letters, 2014, 21(9):1068-1072. [8] ABDELWAHAB M, BUSSO C. Supervised domain adaptation for emotion recognition from speech[C]//Proceedings of 2015 International Conference on Acoustics, Speech and Signal Processing (ICASSP). Brisbane, Australia:IEEE, 2015:5058-5062. [9] HASSAN A, DAMPER R, NIRANJAN M. On acoustic emotion recognition:Compensating for covariate shift[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(7):1458-1468. [10] SONG P, ZHENG W M, LIANG R Y. Speech emotion recognition based on sparse transfer learning method[J]. IEICE Transactions on Information and Systems, 2015, 98(7):1409-1412. [11] SONG P, ZHENG W M, OU S F, et al. Cross-corpus speech emotion recognition based on transfer non-negative matrix factorization[J]. Speech Communication, 2016, 83:34-41. [12] YAN S C, XU D, ZHANG B Y, et al. Graph embedding and extensions:A general framework for dimensionality reduction[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(1):40-51. [13] NIE F P, HUANG H, CAI X, et al. Efficient and robustfeature selection via joint l2,1-norms minimization[C]//Proceedings of the 24th Annual Conference on Neural Information Processing Systems (NIPS). Vancouver, Canada:NIPS, 2010:1813-1821. [14] HE R, TAN T N, WANG L, et al. l2,1 regularized correntropy for robust feature selection[C]//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Providence, USA:IEEE, 2012:2504-2511. [15] BURKHARDT F, PAESCHKE A, ROLFES M, et al. A database of German emotional speech[C]//Proceedings of INTERSPEECH. Lisbon, Portugal:ISCA, 2005:1517-1520. [16] MARTIN O, KOTSIA I, MACQ B, et al. The eNTERFACE'05 audio-visual emotion database[C]//Proceedings of the 22nd International Conference on Data Engineering Workshops. Atlanta, USA:IEEE, 2006:8-8. [17] EYBEN F, WÖLLMER M, SCHULLER B. Opensmile:The munich versatile and fast open-source audio feature extractor[C]//Proceedings of the 18th ACM International Conference on Multimedia. Firenze, Italy:ACM, 2010:1459-1462. [18] SCHULLER B, STEIDL S, BATLINER A, et al. The INTERSPEECH 2010 paralinguistic challenge[C]//Proceeding of the 11th Annual Conference of the International Speech Communication Association. Makuhari, Japan:ISCA, 2010:2795-2798. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|