Speaker verification based on SVM and total variability
GUO Wu1, ZHANG Sheng1, XU Jie2, HU Guoping3, MA Xiaokong1
1. Department of Electronic Engineering and Information Science, University of Science and Technology of China, Hefei 230026, China;
2. National Computer Network Emergency Response Technical Team Coordination Center of China, Beijing 100029, China;
3. IFLYTEK Corporation, Hefei 230088, China
Abstract:The total variability factor extractor and the probability linear discriminant analysis (PLDA) algorithms have been the state-of-the-art for text-independent speaker verification. This study combines a support vector machine (SVM) with the PLDA. The low dimensional i-vectors of the total variability system are used as the inputs to the support vector machine, with the cosine kernel function used to achieve better discrimination. This method achieves considerable performance improvement with the PLDA system. Furthermore, the score fusion of the SVM with the PLDA give even better results. Tests were conducted on the female part of the interview section of the NIST 2012 core test corpus. The detection cost function (DCF) of the fusion system was reduced by 25.1% for common condition 1 and 25.2% for condition 3 compared with the best results for a single system.
郭武, 张圣, 徐杰, 胡国平, 马啸空. 全变量系统和支持向量机结合的说话人确认[J]. 清华大学学报(自然科学版), 2017, 57(3): 240-243.
GUO Wu, ZHANG Sheng, XU Jie, HU Guoping, MA Xiaokong. Speaker verification based on SVM and total variability. Journal of Tsinghua University(Science and Technology), 2017, 57(3): 240-243.
Reynolds D A, Quatieri T F, Dunn R B. Speaker verification using adapted Gaussian mixture models[J]. Digital Signal Processing, 2000, 10(1):19-41.
[2]
Kenny P, Boulianne G, Ouellet P, et al. Joint factor analysis versus eigenchannels in speaker recognition[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2007, 15(4):1435-1447.
[3]
Dehak N, Kenny P J, Dehak R, et al. Front-end factor analysis for speaker verification[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(4):788-798.
[4]
Prince S J D, Elder J H. Probabilistic linear discriminant analysis for inferences about identity[C]//2007 IEEE 11th International Conference on Computer Vision. Rio de Janeiro, Brazil:IEEE Press, 2007:1-8.
[5]
Burget L, Plchot O, Cumani S, et al. Discriminatively trained probabilistic linear discriminant analysis for speaker verification[C]//2011 IEEE international conference on acoustics, speech and signal processing (ICASSP). Prague, Czech Republic:IEEE Press, 2011:4832-4835.
[6]
Jiang Y, Kong A L, Wang L. PLDA in the i-supervector space for text-independent speaker verification[J]. Eurasip Journal on Audio Speech and Music Processing, 2014, 2014(1):1-13.
[7]
Kenny P, Stafylakis T, Ouellet P, et al. PLDA for speaker verification with utterances of arbitrary duration[C]//2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Brisbane, Australia:IEEE Press, 2013:7649-7653.
[8]
Li N, Mak M W. SNR-invariant PLDA modeling in nonparametric subspace for robust speaker verification[J]. IEEE/ACM Transactions on Audio Speech and Language Processing, 2015, 23(10):1648-1659.
[9]
Bourouba H, Korba C A, Djemili R. Novel approach in speaker identification using SVM and GMM[J]. Control Engineering & Applied Informatics, 2013, 15(3):87-95.
[10]
Ding I J, Yen C T, Ou D C. A method to integrate GMM, SVM and DTW for speaker recognition[J]. International Journal of Engineering and Technology Innovation, 2014, 4(1):38-47.
[11]
Campbell W M, Sturim D E, Reynolds D A, et al. SVM based speaker verification using a GMM supervector kernel and NAP variability compensation[C]//2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings. Brisbane, Australia:IEEE Press, 2006, 1:I-I.
[12]
Solomonoff A, Quillen C, Campbell W M. Channel compensation for SVM speaker recognition[C]//ICASSP 2005, Acoustics, Speech, and Signal Processing Proceedings. Philadelphia, PA, USA:IEEE Press, 2010:629-632."