针对口语语音处理中的基频提取,提出了一种改进的自相关函数基频提取算法。该算法在原始自相关函数方法的基础上,通过利用语音频谱的纹理特征来提高正确基频值的权重,利用增加候选基频的个数来增大搜索空间,以及利用可靠种子来限制搜索路径这3项措施增加了正确基频值在搜索空间中的出现比例和权重,优化了搜索空间,从而改善了原有基频提取算法的性能。在数据集Keele和FDA上的实验结果显示:与原始算法相比,本文算法的有声错误率相对减少28.74%,总体错误率相对减少5.53%,更适合于口语处理。
This paper presents an improved pitch extraction algorithm based on an auto-correlation function for speech processing. The original auto-correlation function algorithm is optimized by increasing the weights of the right pitch values by the texture feature, enlarging the search space by using more candidate pitch values, and restricting the search path to reliable pitch values. These three measures control the weight and proportion of the right pitch values in the search space and then optimize the search space. The algorithm was evaluated on the Keele and FDA databases. The results show that the voiced error is reduced by 28.74% and the pitch tract error is reduced by 5.53% relative to the original algorithm. Thus, this algorithm is more suitable for speech processing.
[1] De Cheveigné A, Kawahara H. YIN, a fundamental frequency estimator for speech and music[J]. The Journal of the Acoustical Society of America, 2002, 111(4):1917-1930. [2] Talkin D. A robust algorithm for pitch tracking (RAPT)[J].Speech coding and synthesis, 1995, 1(1):495-518. [3] Kasi K, Zahorian S A. Yet another algorithm for pitch tracking[C]//2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Kyoto, Kyoto-fu, Japan:IEEE, 2002:361-364. [4] Klapuri A. Multipitch analysis of polyphonic music and speech signals using an auditory model[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2008, 16(2):255-266. [5] Gonzalez S, Brookes M. PEFAC-A pitch estimation algorithm robust to high levels of noise[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014, 22(2):518-530. [6] Huang F, Lee T. Pitch estimation in noisy speech using accumulated peak spectrum and sparse estimation technique[J].IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(1):99-109. [7] Hajimolahoseini H, Amirfattahi R, Soltanian-Zadeh H, et al. Instantaneous fundamental frequency estimation of non-stationary periodic signals using non-linear recursive filters[J].IET Signal Processing, 2015, 9(2):143-153. [8] Hajimolahoseini H, Amirfattahi R, Gazor S, et al. Robust estimation and tracking of pitch period using an efficient Bayesian filter[J].IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016, 24(7):1219-1229. [9] Lee B S, Ellis D P W. Noise robust pitch tracking by subband autocorrelation classification[C]//Interspeech. Portland, Oregon, USA:ICSA, 2012:707-710. [10] Chu W, Alwan A. SAFE:A statistical approach to F0 estimation under clean and noisy conditions[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(3):933-944. [11] Han K, Wang D L. Neural network based pitch tracking in very noisy speech[J].IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014, 22(12):2158-2168. [12] Boersma P, Weenink D. Praat:Doing phonetics by computer[Z/OL].[2016-06-26]. http://www.praat.org. [13] Weszka J S, Dyer C R, Rosenfeld A. A comparative study of texture measures for terrain classification[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1976, SMC-6(4):269-285. [14] Plante F, Meyer G F, Ainsworth W A. A pitch extraction reference database[C]//Eurospeech. Madrid, Spain:ICSA, 1995:18-21. [15] Bagshaw P C, Hiller S M, Jack M A. Enhanced pitch tracking and the processing of f0 contours for computer aided intonation teaching[C]//Eurospeech. Berlin, Germany:ICSA, 1993:1003-1006. [16] Royal Institute of Technology. WaveSurfer[Z/OL].[2016-06-26]. <a href="http://www.speech.kth.se/wavesurfer/">http://www.speech.kth.se/wavesurfer/</a>.