1. The Institute of Acoustics, University of Chinese Academy of Sciences, Beijing 100190, China; 2. Shanghai Acoustics Laboratory, Chinese Academy of Sciences, Shanghai 200815, China
摘要传统的基于相关峰的广义互相关算法在混响环境下性能急剧下降,尽管一些优先效应模型被提出以改善其性能,但是这些模型计算复杂且对阈值选取很敏感。该文首先通过协方差矩阵的特征值来分别更新语音的相干函数和噪声的相干函数,随后将语音的相干函数与理想相干函数匹配,用于时延差估计。估计出的时延差和噪声的相干函数用于相干与散射信号能量比值(coherent-to-diffuse power ratio,CDR)的估计,最后利用实时估计出来的CDR值进行混响抑制。实验结果表明:该方法的定位误差明显低于传统方法,且混响抑制后的主观语音质量评估(perceptual evaluation of speech quality,PESQ)分数高于对比算法。
Abstract:The performance of traditional cross-correlation based time-delay estimation methods is sharply degraded in reverberation environments. Precedence effect models have been proposed with cross-correlation functions, but these models are quite parameter-sensitive and the front-end processes are very complex. This paper describes a method that first updates a function of the speech and noise based on the eigenvalues of the covariance matrix. Then, a coherence function of the speech is matched to the ideal coherence function for the time-delay estimate. Then, the estimated time delay and the noise coherence function are applied to the coherent-to-diffuse power ratio (CDR) estimator for reverberation suppression. Tests show that this scheme has higher localization accuracy than traditional methods and achieves higher PESQ (perceptual evaluation of speech quality) scores than other CDR estimators.
[1] ALLEN J B, BERKLEY D A, BLAUERT J. Multimicrophone signal-processing technique to remove room reverberation from speech signals[J]. Journal of the Acoustical Society of America, 1977, 62(4):912-915. [2] ZELINSKI R. A microphone array with adaptive post-filtering for noise reduction in reverberant rooms[C]//1998 International Conference on Acoustics, Speech, and Signal Processing. Atlanta, GA, USA:IEEE, 1988:2578-2581. [3] LEBART K, BOUCHER J M, DENBIGH P N. A binaural system for the suppression of late reverberation[C]//Proceedings of the 2nd European Signal Processing Conference (EUSIPCO). Rhodes, Greece:EURASIP, 1998:97-100 [4] JEUB M, FER M, ESCH T, et al. Model-based dereverberation preserving binaural cues[J]. IEEE Transactions on Audio Speech & Language Processing, 2010, 18(7):1732-1745. [5] SCHWARZ A, KELLERMANN W. Coherent-to-diffuse power ratio estimation for dereverberation[J]. IEEE/ACM Transactions on Audio Speech & Language Processing, 2015, 23(6):1006-1018. [6] ZHENG C, SCHWARZ A, KELLERMANN W, et al. Binaural coherent-to-diffuse-ratio estimation for dereverberation using an ITD model[C]//Proceedings of the 23rd European Signal Processing Conference. Nice, French:EURASIP, 2015:1048-1052. [7] KNAPP C, CARTER G. The generalized correlation method for estimation of time delay[J]. IEEE Transactions on Acoustics Speech & Signal Processing, 2003, 24(4):320-327. [8] LIU C, WHEELER B C, JR W D O, et al. Localization of multiple sound sources with two microphones[J]. Journal of the Acoustical Society of America, 2000, 108(4):1888-1905. [9] JR R M S, COLBURN H S. Theory of binaural interaction based on auditory-nerve data. IV. A model for subjective lateral position[J]. Journal of the Acoustical Society of America, 1978, 64(1):127-140. [10] LINDEMANN W. Extension of a binaural cross-correlation model by contralateral inhibition. I. Simulation of lateralization for stationary signals[J]. Journal of the Acoustical Society of America, 1986, 80(6):1608-1622. [11] LITOVSKY R Y, COLBURN H S, YOST W A, et al. The precedence effect[J]. Journal of the Acoustical Society of America, 1999, 106(4):1633-1654. [12] HUANG J, OHNISHI N, SUGIE N. Sound localization in reverberant environment based on the model of the precedence effect[J]. IEEE Transactions on Instrumentation & Measurement, 1997, 46(4):842-846. [13] MARTIN K D. Echo suppression in a computational model of the precedence effect[C]//1997 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics. New Paltz, NY, USA:IEEE, 1997:4. [14] MEDDIS R, HEWITT M J, SHACKLETON T M. Implementation details of a computation model of the inner hair-cell auditory-nerve synapse[J]. Journal of the Acoustical Society of America, 1990, 87(87):1813-1816. [15] FALLER C, MERIMAA J. Source localization in complex listening situations:Selection of binaural cues based on interaural coherence[J]. Journal of the Acoustical Society of America, 2004, 116(5):3075-3089. [16] LAVANDIER M, CULLING J F. Speech segregation in rooms:Importance of the interferer interaural coherence[J]. Journal of the Acoustical Society of America, 2008, 123(5):2977-2977. [17] RAKERD B, HARTMANN W M. Localization of sound in rooms. V. Binaural coherence and human sensitivity to interaural time differences in noise[J]. Journal of the Acoustical Society of America, 2010, 128(5):3052-3063. [18] JI Y, PARK Y C, KIM D W, et al. Robust noise PSD estimation for binaural hearing aids in time-varying diffuse noise field[C]//IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada:IEEE, 2013:7264-7268. [19] ALLEN J B, BERKLEY D A. Image method for efficiently simulating small room acoustics[J]. Journal of the Acoustical Society of America, 1979, 65(4):943-950. [20] ROTHAUSER E H, CHAPMAN W D, GUTTMAN N, et al. IEEE recommended practice for speech quality measurements[J]. IEEE Transactions on Audio and Electroacoust, 1969, 17(3):225-246. [21] RIX A W, BEERENDS J G, HOLLIER M P, et al. Perceptual evaluation of speech quality (PESQ)-A new method for speech quality assessment of telephone networks and codecs[C]//IEEE International Conference on Acoustics, Speech, and Signal Processing. Salt Lake City, UT, USA:IEEE, 2001:749-752. [22] JEUB M, FER M, VARY P. A binaural room impulse response database for the evaluation of dereverberation algorithms[C]//Proceedings of the 16th International Conference on Digital Signal Processing. Santorini, Greece:IEEE, 2009:1-5.