基于GMM托肯配比相似度校正得分的说话人识别

doi:10.16511/j.cnki.qhdxxb.2017.21.006

摘要
图/表
参考文献
相关文章
Metrics

全文: PDF(993 KB)
输出: BibTeX | EndNote (RIS)

摘要该文提出一种基于Gauss混合模型（GMM）托肯配比相似度校正得分（GMM token ratio similarity based score regulation，GTRSR）的说话人识别方法。基于GMM-UBM（通用背景模型）识别框架，在自适应训练和测试阶段计算并保存自适应训练语句和测试语句在UBM上使特征帧得分最高的Gauss分量编号（GMM token）出现的比例（配比），然后在测试阶段计算测试语句和自适应训练语句的GMM托肯分布的配比的相似度GTRS，当GTRS小于某阈值时对测试得分乘以一个惩罚因子，将结果作为测试语句的最终得分。在MASC数据库上进行的实验表明，该方法能够使系统识别性能有一定的提升。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS

	作者相关文章
	杨莹春
	邓立才

关键词 ：说话人识别, GMM托肯配比(GTR), 得分校正

Abstract：A GMM token ratio similarity based score regulation approach for speaker recognition is presented in this paper to judge the reliability of a test score based on the GMM token ratio similarity. In the GMM-UBM (universal background model) method, the GMM token which is the index of the UBM component giving the highest score is saved for each frame to form a vector called the GMM token ratio (GTR) of an utterance during the training and testing phases. In the test phase, the test utterance GTR is compared to the training utterance GTR to compute the similarity for a target speaker. When the similarity is less than a threshold, the original likelihood score is regulated by multiplying by a penalty factor as the final score of this test utterance. Tests on MASC show that this method improves the speaker recognition performance.

Key words： speaker recognition GMM token ratio (GTR) score regulation

收稿日期: 2016-07-05 出版日期: 2017-01-15

ZTFLH:

TP391.43

引用本文:

杨莹春, 邓立才. 基于GMM托肯配比相似度校正得分的说话人识别[J]. 清华大学学报（自然科学版）, 2017, 57(1): 28-32.
YANG Yingchun, DENG Licai. Score regulation based on GMM token ratio similarity for speaker recognition. Journal of Tsinghua University(Science and Technology), 2017, 57(1): 28-32.

链接本文:

http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2017.21.006 或 http://jst.tsinghuajournals.com/CN/Y2017/V57/I1/28

图1 基于GMM 托肯配比的得分校正流程图

表1 方法1、2和3的EER 及IR

表2 方法4、5的EER 及IR

表3 阈值对方法5的影响

[7]	Torres-Carrasquillo P, Reynolds D. Language identification using Gaussian mixture model tokenization[C]//Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. Orlando, FL, USA:IEEE Press, 2002:757-760.<br />
[8]	顾明亮, 沈兆勇.基于语音配列的汉语方言自动辨识[J]. 中文信息学报, 2006, 20(5):77-82.GU Mingliang, SHEN Zhaoyong. Phonotatics based Chinese dialects identification[J]. Journal of Chinese Information Processing, 2006, 20(5):77-82. (in Chinese)<br />
[1]	Reynolds D. A Gaussian Mixture Modeling Approach to Text-Independent Speaker Identification[D]. Atlanta, GA, USA:Georgia Institute of Technology, 1992.
[2]	Reynolds D. Speaker verification using adapted Gaussian mixture models[J]. Digital Signal Processing, 2000, 10(1):19-41.
[9]	MA Bin, ZHU Donglai, TONG Rong, et al. Speaker cluster based on GMM Tokenization for speaker recognition[C]//Proceedings of Interspeech, Pittsburgh, PA, USA, 2006:505-508.<br />
[3]	吴朝晖, 杨莹春. 说话人识别模型与方法[M]. 北京:清华大学出版社, 2009.WU Zhaohui, YANG Yingchun. Speaker Recognition:Models and Methods[M]. Beijing:Tsinghua University Press, 2009. (in Chinese)
[4]	Tomi K, LI Haizhou. An overview of text-independent speaker recognition:From features to supervectors[J]. Speech Communication, 2010, 52(1):12-40.
[10]	TONG Rong, MA Bin, LEE Kong-Aik, et al. Fusion of acoustic and tokenization features for speaker recognition[C]//Proceedings of the 5th International Symposium on Chinese Spoken Language Processing. Kentridge, Singapore:Springer Press, 2006:566-577.<br />
[5]	XIANG Bing. Text-independent speaker verification with dynamic trajectory model[J]. IEEE Signal Processing Letters, 2003, 10(5):141-142.
[11]	邓立才. GMM说话人建模的关键问题研究[D]. 杭州:浙江大学, 2014.DENG Licai, Research on Key Problems of GMM Speaker Modeling[D]. Hanzghou:Zhejiang University, 2014. (in Chinese)<br />
[6]	Zissman M. Comparison of four approaches to automatic language identification of telephone speech[J]. IEEE Transaction on Speech and Audio Processing, 1996, 4(1):31-44.
[12]	WU Tian, YANG Yingchun, WU Zhaohui, et al. MASC:A speech corpus in Mandarin for emotion analysis and affective speaker recognition[C]//Proceedings of IEEE Odyssey Speaker and Language Recognition Workshop, Puerto Rico:IEEE Press, 2006:1-5.
[7]	Torres-Carrasquillo P, Reynolds D. Language identification using Gaussian mixture model tokenization[C]//Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. Orlando, FL, USA:IEEE Press, 2002:757-760.
[8]	顾明亮, 沈兆勇.基于语音配列的汉语方言自动辨识[J]. 中文信息学报, 2006, 20(5):77-82.GU Mingliang, SHEN Zhaoyong. Phonotatics based Chinese dialects identification[J]. Journal of Chinese Information Processing, 2006, 20(5):77-82. (in Chinese)
[9]	MA Bin, ZHU Donglai, TONG Rong, et al. Speaker cluster based on GMM Tokenization for speaker recognition[C]//Proceedings of Interspeech, Pittsburgh, PA, USA, 2006:505-508.
[10]	TONG Rong, MA Bin, LEE Kong-Aik, et al. Fusion of acoustic and tokenization features for speaker recognition[C]//Proceedings of the 5th International Symposium on Chinese Spoken Language Processing. Kentridge, Singapore:Springer Press, 2006:566-577.
[11]	邓立才. GMM说话人建模的关键问题研究[D]. 杭州:浙江大学, 2014.DENG Licai, Research on Key Problems of GMM Speaker Modeling[D]. Hanzghou:Zhejiang University, 2014. (in Chinese)
[12]	WU Tian, YANG Yingchun, WU Zhaohui, et al. MASC:A speech corpus in Mandarin for emotion analysis and affective speaker recognition[C]//Proceedings of IEEE Odyssey Speaker and Language Recognition Workshop, Puerto Rico:IEEE Press, 2006:1-5.

[1]	艾斯卡尔·肉孜, 王东, 李蓝天, 郑方, 张晓东, 金磐石. 说话人识别中的分数域语速归一化[J]. 清华大学学报（自然科学版）, 2018, 58(4): 337-341.
[2]	田垚, 蔡猛, 何亮, 刘加. 基于深度神经网络和Bottleneck特征的说话人识别系统[J]. 清华大学学报（自然科学版）, 2016, 56(11): 1143-1148.
[3]	郭武, 马啸空. 复杂噪声场景下的活动语音检测方法[J]. 清华大学学报（自然科学版）, 2016, 56(11): 1190-1195.

Viewed

Full text

Abstract

Cited

Shared

Discussed