Real time estimation and tracking method for the direction of arrival of single sound source based on Kalman filtering and frequency focusing

ZHOU Jing, BAO Changchun, DUAN Haiwei

Journal of Tsinghua University(Science and Technology) ›› 2024, Vol. 64 ›› Issue (11) : 1902-1910.

PDF(1894 KB)
PDF(1894 KB)
Journal of Tsinghua University(Science and Technology) ›› 2024, Vol. 64 ›› Issue (11) : 1902-1910. DOI: 10.16511/j.cnki.qhdxxb.2024.26.043
SPECIAL SECTION: MAN-MACHINE SPEECH COMMUNICATION

Real time estimation and tracking method for the direction of arrival of single sound source based on Kalman filtering and frequency focusing

  • {{article.zuoZhe_EN}}
Author information +
History +

Abstract

[Objective] Estimation of direction of arrival (DOA) is critical in spatial audio coding, speech enhancement, sound field synthesis, and sound source imaging. Commonly used signal model-based DOA estimation methods, such as the multiple signal classification method, can effectively estimate DOA information in noise-free and anechoic scenarios. However, real-world environments always have noise and reverberation, particularly in far-field speech communication scenarios characterized by low signal-to-noise ratios and strong reverberation. Furthermore, the sound source may be in motion. These factors considerably impair the performance of DOA estimation methods based on signal models. To address this issue, this paper introduces a real-time estimation and tracking method for the DOA of a single sound source, using Kalman filtering and frequency focusing. [Methods] The proposed method consists of three procedures: denoising, dereverberation, and DOA estimation. With regard to the denoising procedure, an objective optimization function to minimize the error of the denoised signal is established. This function is solved using a Kalman filter, which leads to obtaining the denoised signal through Kalman gain-based posterior estimation. For the dereverberation procedure, based on the autoregressive coefficients of the late reverberation components, an objective optimization function to minimize the error of the multichannel linear prediction (MCLP) coefficients is established. This function is also solved through another Kalman filter to obtain the MCLP coefficients. The DOA estimation procedure is implemented by using a frequency focusing based steered response power (FF-SRP) method, which can circumvent signal component diffusion within subspace decomposition. In particular, a structure that effectively intertwines these three procedures, enhancing the contribution of denoising and dereverberation results to DOA estimation. In this structure, a propagation matrix is utilized to integrate the denoising and dereverberation procedures, creating a causative iteration between them. Subsequently, a minimum variance distortionless response (MVDR) beamforming method is used to replace the multichannel Wiener filtering method. This is to obtain a prior estimation of the covariance matrix of the target signal. The MVDR beamforming method offers two advantages: it reduces the distortion of the target signal and integrates the DOA estimation procedure with the denoising procedure, thereby promoting a causal and orderly iteration among the three procedures. [Results] Experiments were conducted using a microphone array signal simulator and the TIMIT corpus. The mean absolute error (MAE) of the estimated DOA, along with the DOA track of the moving speaker, served as the evaluation measures. Experimental results revealed several key findings: (1) As RT60 increased, the MAE of all methods increased, clearly demonstrating that reverberation significantly affects DOA estimation performance. (2) Compared with the reference methods, the proposed method consistently delivered the lowest MAE values under different RT60s and SNRs. This suggests that the proposed method has higher accuracy in DOA estimation. (3) In terms of DOA trajectory, the proposed method again outperformed the reference methods by producing the smallest error. This indicates that the proposed method has better performance in DOA tracking. [Conclusions] By integrating denoising, dereverberation, and DOA estimation through a causal and recursive iteration structure, the performance of DOA estimation and tracking can be significantly enhanced. The proposed method effectively mitigates the detrimental impact of noise and reverberation on DOA estimation and tracking accuracy in single sound source scenarios.

Key words

direction of arrival estimation / multichannel linear prediction / Kalman filtering / frequency focusing / dereverberation

Cite this article

Download Citations
ZHOU Jing, BAO Changchun, DUAN Haiwei. Real time estimation and tracking method for the direction of arrival of single sound source based on Kalman filtering and frequency focusing[J]. Journal of Tsinghua University(Science and Technology). 2024, 64(11): 1902-1910 https://doi.org/10.16511/j.cnki.qhdxxb.2024.26.043

References

[1] TOURBABIN V, RAFAELY B. Direction of arrival estimation using microphone array processing for moving humanoid robots [J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015, 23(11): 2046-2058.
[2] WANG Z Q, WANG P D, WANG D L. Complex spectral mapping for single-and multi-channel speech enhancement and robust ASR [J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020, 28: 1778-1787.
[3] ERANTI P, BARKANA B. An overview of direction-of-arrival estimation methods using adaptive directional time-frequency distributions [J]. Electronics, 2022, 11(9): 1321.
[4] HU Y, SAMARASINGHE P N, GANNOT S, et al. Decoupled multiple speaker direction-of-arrival estimator under reverberant environments [J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022, 30: 3120-3133.
[5] FIRTHA G, FIALA P. Sound field synthesis of uniformly moving virtual monopoles [J]. Journal of the Audio Engineering Society, 2015, 63(1-2): 46-53.
[6] ALZAALIG A, HU G H, LIU X D, et al. Fast acoustic source imaging using multi-frequency sparse data [J]. Inverse Problems, 2020, 36(2): 025009.
[7] DENG S H, BAO C C. DNN-based multi-channel speech coding employing sound localization [C] //Proceedings of the 2022 Data Compression Conference (DCC). Snowbird, USA: IEEE, 2022: 451.
[8] SCHWARTZ B, GANNOT S, HABETS E A P. Online speech dereverberation using Kalman filter and EM algorithm [J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015, 23(2): 394-406.
[9] SCHMIDT R. Multiple emitter location and signal parameter estimation [J]. IEEE Transactions on Antennas and Propagation, 1986, 34(3): 276-280.
[10] SALVATI D, DRIOLI C, FORESTI G. Acoustic source localization using a geometrically sampled grid SRP-PHAT algorithm with max-pooling operation [J]. IEEE Signal Processing Letters, 2022, 29: 1828-1832.
[11] ZHOU J, BAO C C. Multi-source wideband DOA estimation method by frequency focusing and error weighting [C] //Proceedings of the 23rd Annual Conference of the International Speech Communication Association. Incheon, South of Korea: ISCA, 2022: 5423-5427.
[12] 周静, 鲍长春, 张旭. 基于聚焦信号子空间估计导向矢量的干扰声源抑制方法 [J]. 电子学报, 2023, 51(1): 76-85. ZHOU J, BAO C C, ZHANG X. Suppression method of the interference sound sources by estimated steering vector based on the focusing signal subspace [J]. Acta Electronica Sinica, 2023, 51(1): 76-85. (in Chinese)
[13] JIA M S, GAO S, WU Y X, et al. Two-dimensional detection based LRSS point recognition for multi-source DOA estimation [J]. Applied Acoustics, 2022, 186: 108481.
[14] YANG X, BAO C C, CUI Z H. Weighting function modification used for phase transform-based time delay estimation [J]. China Communications, 2022, 19(11): 241-256.
[15] 厉剑, 彭任华, 郑成诗, 等. 球谐域自适应混响抵消与声源定位算法 [J]. 声学学报, 2019, 44(5): 874-886. LI J, PENG R H, ZHENG C S, et al. Dereverberation and localization using adaptive reverberation cancellation in the spherical harmonic domain [J]. Acta Acustica, 2019, 44(5): 874-886. (in Chinese)
[16] WANG D S, ZOU Y X. Joint noise and reverberation adaptive learning for robust speaker DOA estimation with an acoustic vector sensor [C] //Proceedings of the 19th Annual Conference of the International Speech Communication Association. Hyderabad, India: ISCA, 2018: 821-825.
[17] ANTONELLO N, DE SENA E, MOONEN M, et al. Joint source localization and dereverberation by sound field interpolation using sparse regularization [C] //Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Calgary, Canada: IEEE, 2018: 6892-6896.
[18] DOIRE C S J, BROOKES M, NAYLOR P A, et al. Single-channel online enhancement of speech corrupted by reverberation and noise [J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2017, 25(3): 572-587.
[19] BRAUN S, HABETS E. Linear prediction-based online dereverberation and noise reduction using alternating Kalman filters [J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2018, 26(6): 1119-1129.
[20] DIETZEN T, DOCLO S, MOONEN M, et al. Integrated sidelobe cancellation and linear prediction Kalman filter for joint multi-microphone speech dereverberation, interfering speech cancellation, and noise reduction [J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020, 28: 740-754.
[21] TAN F Q, BAO C C, ZHOU J. Effective dereverberation with a lower complexity at presence of the noise [J]. Applied Sciences, 2022, 12(22): 11819.
[22] SHI L, NIELSEN J, JENSEN J, et al. Robust bayesian pitch tracking based on the harmonic model [J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2019, 27(11): 1737-1751.
[23] GERKMANN T, HENDRIKS R. Unbiased MMSE-based noise power estimation with low complexity and low tracking delay [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(4): 1383-1393.
[24] BEIT-ON H, RAFAELY B. Focusing and frequency smoothing for arbitrary arrays with application to speaker localization [J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020, 28: 2184-2193.
[25] LEVIN D, HABETS E A P, GANNOT S. Maximum likelihood estimation of direction of arrival using an acoustic vector-sensor [J]. The Journal of the Acoustical Society of America, 2012, 131(2): 1240-1248.
[26] ZHOU J, BAO C C, ZHANG X, et al. Design of a robust MVDR beamforming method with low-latency by reconstructing covariance matrix for speech enhancement [J]. Applied Acoustics, 2023, 211: 109464.
[27] CHENG R, BAO C C, CUI Z H. MASS: Microphone array speech simulator in room acoustic environment for multi-channel speech coding and enhancement [J]. Applied Sciences, 2020, 10(4): 1484.
PDF(1894 KB)

Accesses

Citation

Detail

Sections
Recommended

/