全变量系统和支持向量机结合的说话人确认

郭武; 张圣; 徐杰; 胡国平; 马啸空

doi:10.16511/j.cnki.qhdxxb.2017.26.003

PDF(1117 KB)

清华大学学报（自然科学版） ›› 2017, Vol. 57 ›› Issue (3) : 240-243. DOI: 10.16511/j.cnki.qhdxxb.2017.26.003

计算机科学与技术

全变量系统和支持向量机结合的说话人确认

郭武¹, 张圣¹, 徐杰², 胡国平³, 马啸空¹

作者信息 +

Speaker verification based on SVM and total variability

GUO Wu¹, ZHANG Sheng¹, XU Jie², HU Guoping³, MA Xiaokong¹

Author information +

文章历史 +

摘要

基于全变量因子分析和概率线性区分性分析的算法是目前与文本无关的说话人确认的主流算法。该文将全变量分析和支持向量机结合起来，把低维的全变量因子作为支持向量机的输入特征，并采用余弦核函数来增强低维特征的区分性，该系统取得了与当前主流算法相当的性能；进一步，将此系统得分和概率线性鉴别分析系统得分融合起来可以取得明显的性能提升。在NIST 2012说话人评测通用测试条件的女声部分，融合后的系统在情境一和三的检测代价函数相对最好的单系统分别下降了25.1%和25.2%。

Abstract

The total variability factor extractor and the probability linear discriminant analysis (PLDA) algorithms have been the state-of-the-art for text-independent speaker verification. This study combines a support vector machine (SVM) with the PLDA. The low dimensional i-vectors of the total variability system are used as the inputs to the support vector machine, with the cosine kernel function used to achieve better discrimination. This method achieves considerable performance improvement with the PLDA system. Furthermore, the score fusion of the SVM with the PLDA give even better results. Tests were conducted on the female part of the interview section of the NIST 2012 core test corpus. The detection cost function (DCF) of the fusion system was reduced by 25.1% for common condition 1 and 25.2% for condition 3 compared with the best results for a single system.

导出引用

郭武, 张圣, 徐杰, 胡国平, 马啸空. 全变量系统和支持向量机结合的说话人确认[J]. 清华大学学报（自然科学版）. 2017, 57(3): 240-243 https://doi.org/10.16511/j.cnki.qhdxxb.2017.26.003

GUO Wu, ZHANG Sheng, XU Jie, HU Guoping, MA Xiaokong. Speaker verification based on SVM and total variability[J]. Journal of Tsinghua University(Science and Technology). 2017, 57(3): 240-243 https://doi.org/10.16511/j.cnki.qhdxxb.2017.26.003

中图分类号： TN912.34

参考文献

"[1] Reynolds D A, Quatieri T F, Dunn R B. Speaker verification using adapted Gaussian mixture models[J]. Digital Signal Processing, 2000, 10(1):19-41. [2] Kenny P, Boulianne G, Ouellet P, et al. Joint factor analysis versus eigenchannels in speaker recognition[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2007, 15(4):1435-1447. [3] Dehak N, Kenny P J, Dehak R, et al. Front-end factor analysis for speaker verification[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(4):788-798. [4] Prince S J D, Elder J H. Probabilistic linear discriminant analysis for inferences about identity[C]//2007 IEEE 11th International Conference on Computer Vision. Rio de Janeiro, Brazil:IEEE Press, 2007:1-8. [5] Burget L, Plchot O, Cumani S, et al. Discriminatively trained probabilistic linear discriminant analysis for speaker verification[C]//2011 IEEE international conference on acoustics, speech and signal processing (ICASSP). Prague, Czech Republic:IEEE Press, 2011:4832-4835. [6] Jiang Y, Kong A L, Wang L. PLDA in the i-supervector space for text-independent speaker verification[J]. Eurasip Journal on Audio Speech and Music Processing, 2014, 2014(1):1-13. [7] Kenny P, Stafylakis T, Ouellet P, et al. PLDA for speaker verification with utterances of arbitrary duration[C]//2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Brisbane, Australia:IEEE Press, 2013:7649-7653. [8] Li N, Mak M W. SNR-invariant PLDA modeling in nonparametric subspace for robust speaker verification[J]. IEEE/ACM Transactions on Audio Speech and Language Processing, 2015, 23(10):1648-1659. [9] Bourouba H, Korba C A, Djemili R. Novel approach in speaker identification using SVM and GMM[J]. Control Engineering & Applied Informatics, 2013, 15(3):87-95. [10] Ding I J, Yen C T, Ou D C. A method to integrate GMM, SVM and DTW for speaker recognition[J]. International Journal of Engineering and Technology Innovation, 2014, 4(1):38-47. [11] Campbell W M, Sturim D E, Reynolds D A, et al. SVM based speaker verification using a GMM supervector kernel and NAP variability compensation[C]//2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings. Brisbane, Australia:IEEE Press, 2006, 1:I-I. [12] Solomonoff A, Quillen C, Campbell W M. Channel compensation for SVM speaker recognition[C]//ICASSP 2005, Acoustics, Speech, and Signal Processing Proceedings. Philadelphia, PA, USA:IEEE Press, 2010:629-632."

PDF(1117 KB)

Accesses

Citation

Detail

段落导航

收稿日期	出版日期
2016-06-21	2017-03-15
发布日期
2017-03-15

选择文件类型/文献管理软件名称

选择包含的内容

摘要

Abstract

关键词

Key words

引用本文

{{custom_sec.title}}

{{custom_sec.title}}

参考文献

访问统计

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

摘要

Abstract

关键词

Key words

引用本文

{{custom_sec.title}}

{{custom_sec.title}}

参考文献

访问统计