Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们 横山亮次奖 百年刊庆
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  横山亮次奖  |  百年刊庆
清华大学学报(自然科学版)  2017, Vol. 57 Issue (1): 95-99    DOI: 10.16511/j.cnki.qhdxxb.2017.21.018
  自动化 本期目录 | 过刊浏览 | 高级检索 |
陈萧, 徐波
中国科学院 自动化研究所, 数字内容技术与服务中心, 北京 100190
Improved pitch extraction algorithm for speech processing
CHEN Xiao, XU Bo
Interactive Digital Media Technology Research Center, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
全文: PDF(992 KB)  
输出: BibTeX | EndNote (RIS)      
摘要 针对口语语音处理中的基频提取,提出了一种改进的自相关函数基频提取算法。该算法在原始自相关函数方法的基础上,通过利用语音频谱的纹理特征来提高正确基频值的权重,利用增加候选基频的个数来增大搜索空间,以及利用可靠种子来限制搜索路径这3项措施增加了正确基频值在搜索空间中的出现比例和权重,优化了搜索空间,从而改善了原有基频提取算法的性能。在数据集Keele和FDA上的实验结果显示:与原始算法相比,本文算法的有声错误率相对减少28.74%,总体错误率相对减少5.53%,更适合于口语处理。
E-mail Alert
关键词 语音信号处理基频提取自相关函数    
Abstract:This paper presents an improved pitch extraction algorithm based on an auto-correlation function for speech processing. The original auto-correlation function algorithm is optimized by increasing the weights of the right pitch values by the texture feature, enlarging the search space by using more candidate pitch values, and restricting the search path to reliable pitch values. These three measures control the weight and proportion of the right pitch values in the search space and then optimize the search space. The algorithm was evaluated on the Keele and FDA databases. The results show that the voiced error is reduced by 28.74% and the pitch tract error is reduced by 5.53% relative to the original algorithm. Thus, this algorithm is more suitable for speech processing.
Key wordsspeech signal processing    pitch extraction    auto-correlation function
收稿日期: 2016-07-09      出版日期: 2017-01-20
ZTFLH:  TN912.3  
通讯作者: 徐波,研究员,     E-mail:
陈萧, 徐波. 改进的用于口语处理的基频提取算法[J]. 清华大学学报(自然科学版), 2017, 57(1): 95-99.
CHEN Xiao, XU Bo. Improved pitch extraction algorithm for speech processing. Journal of Tsinghua University(Science and Technology), 2017, 57(1): 95-99.
链接本文:  或
  图1 基频提取算法的改进措施
  图2 语音信号频谱及其包络
  图3 噪声信号频谱及其包络
  图4 改进的基频提取算法的流程图
  表1 算法参数设置
  表2 算法在Keele数据集上的性能
  表3 算法在FDA数据集上的性能
[1] De Cheveigné A, Kawahara H. YIN, a fundamental frequency estimator for speech and music[J]. The Journal of the Acoustical Society of America, 2002, 111(4):1917-1930.
[2] Talkin D. A robust algorithm for pitch tracking (RAPT)[J].Speech coding and synthesis, 1995, 1(1):495-518.
[3] Kasi K, Zahorian S A. Yet another algorithm for pitch tracking[C]//2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Kyoto, Kyoto-fu, Japan:IEEE, 2002:361-364.
[4] Klapuri A. Multipitch analysis of polyphonic music and speech signals using an auditory model[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2008, 16(2):255-266.
[5] Gonzalez S, Brookes M. PEFAC-A pitch estimation algorithm robust to high levels of noise[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014, 22(2):518-530.
[6] Huang F, Lee T. Pitch estimation in noisy speech using accumulated peak spectrum and sparse estimation technique[J].IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(1):99-109.
[7] Hajimolahoseini H, Amirfattahi R, Soltanian-Zadeh H, et al. Instantaneous fundamental frequency estimation of non-stationary periodic signals using non-linear recursive filters[J].IET Signal Processing, 2015, 9(2):143-153.
[8] Hajimolahoseini H, Amirfattahi R, Gazor S, et al. Robust estimation and tracking of pitch period using an efficient Bayesian filter[J].IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016, 24(7):1219-1229.
[9] Lee B S, Ellis D P W. Noise robust pitch tracking by subband autocorrelation classification[C]//Interspeech. Portland, Oregon, USA:ICSA, 2012:707-710.
[10] Chu W, Alwan A. SAFE:A statistical approach to F0 estimation under clean and noisy conditions[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(3):933-944.
[11] Han K, Wang D L. Neural network based pitch tracking in very noisy speech[J].IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014, 22(12):2158-2168.
[12] Boersma P, Weenink D. Praat:Doing phonetics by computer[Z/OL].[2016-06-26].
[13] Weszka J S, Dyer C R, Rosenfeld A. A comparative study of texture measures for terrain classification[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1976, SMC-6(4):269-285.
[14] Plante F, Meyer G F, Ainsworth W A. A pitch extraction reference database[C]//Eurospeech. Madrid, Spain:ICSA, 1995:18-21.
[15] Bagshaw P C, Hiller S M, Jack M A. Enhanced pitch tracking and the processing of f0 contours for computer aided intonation teaching[C]//Eurospeech. Berlin, Germany:ICSA, 1993:1003-1006.
[1] 畅江, 张雪英, 张奇萍, 陈宏涛, 孙颖, 胡凤云. 不同语种及非言语情感声音的ERP研究[J]. 清华大学学报(自然科学版), 2016, 56(10): 1131-1136.
Full text



版权所有 © 《清华大学学报(自然科学版)》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持