Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们 横山亮次奖 百年刊庆
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  横山亮次奖  |  百年刊庆
清华大学学报(自然科学版)  2016, Vol. 56 Issue (11): 1143-1148    DOI: 10.16511/j.cnki.qhdxxb.2016.26.002
  电子工程 本期目录 | 过刊浏览 | 高级检索 |
基于深度神经网络和Bottleneck特征的说话人识别系统
田垚, 蔡猛, 何亮, 刘加
清华大学 电子工程系, 清华信息科学与技术国家实验室(筹), 北京 100084
Speaker recognition system based on deep neural networks and bottleneck features
TIAN Yao, CAI Meng, HE Liang, LIU Jia
Tsinghua National Laboratory for Information Science and Technology, Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
全文: PDF(1083 KB)  
输出: BibTeX | EndNote (RIS)      
摘要 近来,一种结合语音识别中深度神经网络(deep neural network,DNN)模型和说话人识别中身份认证矢量(identity vector,i-vector)模型的方法被证明对说话人识别十分有效。为了进一步提升系统性能,该文提出使用基于说话人标签的DNN模型提取Bottleneck特征代替该模型中的短时频谱特征来计算充分统计量,从而使统计量中包含更多有利于说话人识别的信息。在美国国家标准与技术研究院说话人识别库2008年度女性电话对电话英语测试任务上进行的实验证明了该方法的有效性。相比原来的短时频谱特征,基于Bottleneck特征的说话人识别系统在等错误率和最小检测代价上相对减小了7.65%和5.71%。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
田垚
蔡猛
何亮
刘加
关键词 说话人识别深度神经网络Bottleneck特征    
Abstract:A hybrid model combining the deep neural network (DNN) for speech recognition and the i-vector model for speaker recognition has been shown effective for speaker recognition. The system performance is further improved by using the DNN with speaker labels to extract bottleneck features to replace the original short-term spectral features for statistics extractions to make the statistics contain more speaker-specific information to improve the speaker recognition. Tests on the NIST SRE 2008 female telephone-telephone-English task demonstrate the effectiveness of this method. The relative improvements of the bottleneck features are 7.65% for the equal error rate(EER) and 5.71% for the minium detection function(minDCF) compared with the short-term spectral features.
Key wordsspeaker recognition    deep neural network    Bottleneck features
收稿日期: 2016-06-20      出版日期: 2016-11-15
ZTFLH:  TP391.4  
通讯作者: 刘加,教授,E-mail:liuj@tsinghua.edu.cn     E-mail: liuj@tsinghua.edu.cn
引用本文:   
田垚, 蔡猛, 何亮, 刘加. 基于深度神经网络和Bottleneck特征的说话人识别系统[J]. 清华大学学报(自然科学版), 2016, 56(11): 1143-1148.
TIAN Yao, CAI Meng, HE Liang, LIU Jia. Speaker recognition system based on deep neural networks and bottleneck features. Journal of Tsinghua University(Science and Technology), 2016, 56(11): 1143-1148.
链接本文:  
http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2016.26.002  或          http://jst.tsinghuajournals.com/CN/Y2016/V56/I11/1143
  图1 充分统计量提取流程图
  图2 提取Bottleneck特征所使用的DNN模型结构
  图3 基于Bottleneck特征的充分统计量提取流程图
  表1 UBM/i-vector,DNN/i-vector和 BN/i-vector系统性能对比
  图4 UBM/i-vector,DNN/i-vector和 BN-layer1/i-vector系统DET曲线
[1] Kinnunen T, Li H. An overview of text-independent speaker recognition:From features to supervectors[J]. Speech Communication, 2010, 52(1):12-40.
[2] Dehak N, Kenny P, Dehak R, et al. Front-end factor analysis for speaker verification[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(4):788-798.
[3] Hinton G, Deng L, Yu D, et al. Deep neural networks for acoustic modeling in speech recognition:The shared views of four research groups[J]. IEEE Signal Processing Magazine, 2012, 29(6):82-97.
[4] Dahl G E, Yu D, Deng L, et al. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(1):30-42.
[5] Yaman S, Pelecanos J, Sarikaya R. Bottleneck features for speaker recognition[C]//Proceedings on Odyssey. Singapore:International Speech Communication Association, 2012:105-108.
[6] Variani E, Lei X, McDermott E, et al. Deep neural networks for small footprint text-dependent speaker verification[C]//Proceedings on ICASSP. Florence, Italy:IEEE Press, 2014:4052-4056.
[7] Ghahabi O, Hernando J. i-Vector modeling with deep belief networks for multi-session speaker recognition[J]. Network, 2014, 20:13.
[8] Lei Y, Scheffer N, Ferrer L, et al. A novel scheme for speaker recognition using a phonetically-aware deep neural network[C]//Proceedings on ICASSP. Florence, Italy:IEEE Press, 2014:1695-1699.
[9] Bengio Y. Learning deep architectures for AI[J]. Foundations and Trends in Machine Learning, 2009, 2(1):1-127.
[10] Hinton G, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets[J]. Neural Computation, 2006, 18(7):1527-1554.
[11] Garcia-Romero D, Espy-Wilson C Y. Analysis of i-vector length normalization in speaker recognition systems[C]//Proceedigs on Interspeech. Florence, Italy:International Speech Communication Association, 2011:249-252.
[12] Prince S J D, Elder J H. Probabilistic linear discriminant analysis for inferences about identity[C]//Proceedings on ICCV. Rio de Janeiro, Brazil:IEEE Press, 2007:1-8.
[13] Taigman Y, Yang M, Ranzato M A, et al. Deepface:Closing the gap to human-level performance in face verification[C]//Proceedings on CVPR. Columbus, OH, USA:IEEE Press, 2014:1701-1708
[1] 王文广, 陈运文, 蔡华, 曾彦能, 杨慧宇. 基于混合深度神经网络模型的司法文书智能化处理[J]. 清华大学学报(自然科学版), 2019, 59(7): 505-511.
[2] 王晓明, 赵歆波. 基于深度神经网络的个体阅读眼动预测[J]. 清华大学学报(自然科学版), 2019, 59(6): 468-475.
[3] 张雪英, 牛溥华, 高帆. 基于DNN-LSTM的VAD算法[J]. 清华大学学报(自然科学版), 2018, 58(5): 509-515.
[4] 艾斯卡尔·肉孜, 王东, 李蓝天, 郑方, 张晓东, 金磐石. 说话人识别中的分数域语速归一化[J]. 清华大学学报(自然科学版), 2018, 58(4): 337-341.
[5] 艾斯卡尔·肉孜, 殷实, 张之勇, 王东, 艾斯卡尔·艾木都拉, 郑方. THUYG-20:免费的维吾尔语语音数据库[J]. 清华大学学报(自然科学版), 2017, 57(2): 182-187.
[6] 杨莹春, 邓立才. 基于GMM托肯配比相似度校正得分的说话人识别[J]. 清华大学学报(自然科学版), 2017, 57(1): 28-32.
[7] 郭武, 马啸空. 复杂噪声场景下的活动语音检测方法[J]. 清华大学学报(自然科学版), 2016, 56(11): 1190-1195.
[8] 张劲松, 高迎明, 解焱陆. 基于DNN的发音偏误趋势检测[J]. 清华大学学报(自然科学版), 2016, 56(11): 1220-1225.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 《清华大学学报(自然科学版)》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn