Abstract：This paper presents an improved pitch extraction algorithm based on an auto-correlation function for speech processing. The original auto-correlation function algorithm is optimized by increasing the weights of the right pitch values by the texture feature, enlarging the search space by using more candidate pitch values, and restricting the search path to reliable pitch values. These three measures control the weight and proportion of the right pitch values in the search space and then optimize the search space. The algorithm was evaluated on the Keele and FDA databases. The results show that the voiced error is reduced by 28.74% and the pitch tract error is reduced by 5.53% relative to the original algorithm. Thus, this algorithm is more suitable for speech processing.
De Cheveigné A, Kawahara H. YIN, a fundamental frequency estimator for speech and music[J]. The Journal of the Acoustical Society of America, 2002, 111(4):1917-1930.
Talkin D. A robust algorithm for pitch tracking (RAPT)[J].Speech coding and synthesis, 1995, 1(1):495-518.
Kasi K, Zahorian S A. Yet another algorithm for pitch tracking[C]//2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Kyoto, Kyoto-fu, Japan:IEEE, 2002:361-364.
Klapuri A. Multipitch analysis of polyphonic music and speech signals using an auditory model[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2008, 16(2):255-266.
Gonzalez S, Brookes M. PEFAC-A pitch estimation algorithm robust to high levels of noise[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014, 22(2):518-530.
Huang F, Lee T. Pitch estimation in noisy speech using accumulated peak spectrum and sparse estimation technique[J].IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(1):99-109.
Hajimolahoseini H, Amirfattahi R, Soltanian-Zadeh H, et al. Instantaneous fundamental frequency estimation of non-stationary periodic signals using non-linear recursive filters[J].IET Signal Processing, 2015, 9(2):143-153.
Hajimolahoseini H, Amirfattahi R, Gazor S, et al. Robust estimation and tracking of pitch period using an efficient Bayesian filter[J].IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016, 24(7):1219-1229.
Lee B S, Ellis D P W. Noise robust pitch tracking by subband autocorrelation classification[C]//Interspeech. Portland, Oregon, USA:ICSA, 2012:707-710.
Chu W, Alwan A. SAFE:A statistical approach to F0 estimation under clean and noisy conditions[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(3):933-944.
Han K, Wang D L. Neural network based pitch tracking in very noisy speech[J].IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014, 22(12):2158-2168.
Boersma P, Weenink D. Praat:Doing phonetics by computer[Z/OL].[2016-06-26]. http://www.praat.org.
Weszka J S, Dyer C R, Rosenfeld A. A comparative study of texture measures for terrain classification[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1976, SMC-6(4):269-285.
Plante F, Meyer G F, Ainsworth W A. A pitch extraction reference database[C]//Eurospeech. Madrid, Spain:ICSA, 1995:18-21.
Bagshaw P C, Hiller S M, Jack M A. Enhanced pitch tracking and the processing of f0 contours for computer aided intonation teaching[C]//Eurospeech. Berlin, Germany:ICSA, 1993:1003-1006.