Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  百年期刊
Journal of Tsinghua University(Science and Technology)    2017, Vol. 57 Issue (1) : 95-99     DOI: 10.16511/j.cnki.qhdxxb.2017.21.018
AUTO MATION |
Improved pitch extraction algorithm for speech processing
CHEN Xiao, XU Bo
Interactive Digital Media Technology Research Center, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
Download: PDF(992 KB)  
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks    
Abstract  This paper presents an improved pitch extraction algorithm based on an auto-correlation function for speech processing. The original auto-correlation function algorithm is optimized by increasing the weights of the right pitch values by the texture feature, enlarging the search space by using more candidate pitch values, and restricting the search path to reliable pitch values. These three measures control the weight and proportion of the right pitch values in the search space and then optimize the search space. The algorithm was evaluated on the Keele and FDA databases. The results show that the voiced error is reduced by 28.74% and the pitch tract error is reduced by 5.53% relative to the original algorithm. Thus, this algorithm is more suitable for speech processing.
Keywords speech signal processing      pitch extraction      auto-correlation function     
ZTFLH:  TN912.3  
Issue Date: 15 January 2017
Service
E-mail this article
E-mail Alert
RSS
Articles by authors
CHEN Xiao
XU Bo
Cite this article:   
CHEN Xiao,XU Bo. Improved pitch extraction algorithm for speech processing[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(1): 95-99.
URL:  
http://jst.tsinghuajournals.com/EN/10.16511/j.cnki.qhdxxb.2017.21.018     OR     http://jst.tsinghuajournals.com/EN/Y2017/V57/I1/95
  
  
  
  
  
  
  
[1] De Cheveigné A, Kawahara H. YIN, a fundamental frequency estimator for speech and music[J]. The Journal of the Acoustical Society of America, 2002, 111(4):1917-1930.
[2] Talkin D. A robust algorithm for pitch tracking (RAPT)[J].Speech coding and synthesis, 1995, 1(1):495-518.
[3] Kasi K, Zahorian S A. Yet another algorithm for pitch tracking[C]//2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Kyoto, Kyoto-fu, Japan:IEEE, 2002:361-364.
[4] Klapuri A. Multipitch analysis of polyphonic music and speech signals using an auditory model[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2008, 16(2):255-266.
[5] Gonzalez S, Brookes M. PEFAC-A pitch estimation algorithm robust to high levels of noise[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014, 22(2):518-530.
[6] Huang F, Lee T. Pitch estimation in noisy speech using accumulated peak spectrum and sparse estimation technique[J].IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(1):99-109.
[7] Hajimolahoseini H, Amirfattahi R, Soltanian-Zadeh H, et al. Instantaneous fundamental frequency estimation of non-stationary periodic signals using non-linear recursive filters[J].IET Signal Processing, 2015, 9(2):143-153.
[8] Hajimolahoseini H, Amirfattahi R, Gazor S, et al. Robust estimation and tracking of pitch period using an efficient Bayesian filter[J].IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016, 24(7):1219-1229.
[9] Lee B S, Ellis D P W. Noise robust pitch tracking by subband autocorrelation classification[C]//Interspeech. Portland, Oregon, USA:ICSA, 2012:707-710.
[10] Chu W, Alwan A. SAFE:A statistical approach to F0 estimation under clean and noisy conditions[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(3):933-944.
[11] Han K, Wang D L. Neural network based pitch tracking in very noisy speech[J].IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014, 22(12):2158-2168.
[12] Boersma P, Weenink D. Praat:Doing phonetics by computer[Z/OL].[2016-06-26]. http://www.praat.org.
url: http://www.praat.org.
[13] Weszka J S, Dyer C R, Rosenfeld A. A comparative study of texture measures for terrain classification[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1976, SMC-6(4):269-285.
[14] Plante F, Meyer G F, Ainsworth W A. A pitch extraction reference database[C]//Eurospeech. Madrid, Spain:ICSA, 1995:18-21.
[15] Bagshaw P C, Hiller S M, Jack M A. Enhanced pitch tracking and the processing of f0 contours for computer aided intonation teaching[C]//Eurospeech. Berlin, Germany:ICSA, 1993:1003-1006.
[1] ZHANG Jian, XU Jie, BAO Xiuguo, ZHOU Ruohua, YAN Yonghong. Weighted phone log-likelihood ratio feature for spoken language recognition[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(10): 1038-1041,1047.
[2] CHANG Jiang, ZHANG Xueying, ZHANG Qiping, CHEN Hongtao, SUN Ying, HU Fengyun. ERP research on the emotional voice for different languages and non-speech utterances[J]. Journal of Tsinghua University(Science and Technology), 2016, 56(10): 1131-1136.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
Copyright © Journal of Tsinghua University(Science and Technology), All Rights Reserved.
Powered by Beijing Magtech Co. Ltd