LI Haifeng1, FANG Chunying1,2, MA Lin1, ZHANG Mancai1, SUN Jiayin1
1. School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China;
2. School of Computer and Information Engineering, Heilongjiang Institute of Science and Technology, Harbin 150027, China
Abstract:Pathological speech is difficult to analyze because it is non-stationary and mutative. The study combines the S transform, which has good time-frequency resolution and time-frequency positioning capability with the human auditory Mel characteristics to calculate Mel S-transform cepstrum coefficients (MSCC) which highlight vocal organ pathological lesions. The MSCC are compared with the classical Mel frequency cepstrum coefficients (MFCC) and the common acoustic characteristics in the NCSC corpus to show that the MSCC are more able to portray the dynamics and to quickly identify pathological speech information. In addition, the MSCC also give classification performance based on the F-Score method with the particle swarm optimization algorithm for feature selection. Therefore, the MSCC provide accurate analyses of pathological speech characteristics for clinical diagnosis.
李海峰, 房春英, 马琳, 张满彩, 孙佳音. 病理语音的S变换特征[J]. 清华大学学报(自然科学版), 2016, 56(7): 765-771.
LI Haifeng, FANG Chunying, MA Lin, ZHANG Mancai, SUN Jiayin. S transform feature for pathological speech. Journal of Tsinghua University(Science and Technology), 2016, 56(7): 765-771.
[1] Hernandez-Espinosa C, Gomez-Vilda P, Godino-Llorente J I, et al. Diagnosis of vocal and voice disorders by the speech signal[C]//Proceedings of the International Joint Conference on Neural Networks. Piscataway, NJ, USA:IEEE Press, 2000:253-258.
[2] 彭策. 基于声学与小波熵及自回归模型的病态嗓音诊断新方法研究[D]. 天津:天津大学, 2008. PENG Ce. Study on the Novel Method of Pathological Voice Diagnosis Based on Acoustics, Wavelet Entropy and Auto-Regressive model[D]. Tianjin:Tianjin university, 2008. (in Chinese)
[3] 李宁. 基于声学参数和支持向量机的病理嗓音分类研究[D]. 上海:华东师范大学, 2013. LI Ning. Automatic Classification for Pathological Voice based on Acoustic Parameters and SVM[D]. Shanghai:East China Normal University, 2013. (in Chinese)
[4] 张涛. 基于语音特征的帕金森病可视化诊断方法研究[D]. 秦皇岛:燕山大学, 2012. ZHANG Tao. Visual Diagnostic Method for Parkinson's Disease based on Speech Features[D]. Qinhuangdao:Yanshan University, 2012. (in Chinese)
[5] Godino-Llorente J I, Gomez-Vilda P. Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors[J]. Biomedical Engineering, IEEE Transactions on, 2004, 51(2):380-384.
[6] Shama K, Cholayya N U. Study of harmonics-to-noise ratio and critical-band energy spectrum of speech as acoustic indicators of laryngeal and voice pathology[J]. EURASIP Journal on Applied Signal Processing, 2007(1):1-10.
[7] Gelzinis A, Verikas A, Bacauskiene M. Automated speech analysis applied to laryngeal disease categorization[J]. Computer Methods and Programs in Biomedicine, 2008, 91(1):36-47.
[8] Zhou X, Garcia-Romero D, Mesgarani N, et al. Automatic intelligibility assessment of pathologic speech in head and neck cancer based on auditory-inspired spectro-temporal modulations[C]//The 13th Annual Conference of the International Speech Communication Association. Portland, OR, USA:ISCA, 2012:542-545.
[9] Clapham R P, van der Molen L, van Son R, et al. NKI-CCRT corpus-speech intelligibility before and after advanced head and neck cancer treated with concomitant chemoradiotherapy[C]//Proceedings of the Eighth International Conference on Language Resources and Evaluation, Istanbul, Turkey:ELRA, 2012:3350-3355.
[10] Stockwell R G, Mansinha L, Lowe R P. Localization of the complex spectrum:the S transform[J]. IEEE Transactions on Signal Processing, 1996, 44(4):998-1001.
[11] Ventosa S, Simon C, Schimmel M, et al. The S-transform from a wavelet point of view[J]. IEEE Transactions on Signal Processing, 2008, 56(7):2771-2780.
[12] Kazemi K, Amirian M, Dehghani M J. The S-transform using a new window to improve frequency and time resolutions[J]. Signal, Image and Video Processing, 2014, 8(3):533-541.
[13] Godino-Llorente J I, Gomez-Vilda P, Blanco-Velasco M. Dimensionality reduction of a pathological voice quality assessment system based on Gaussian mixture models and short-term cepstral parameters[J]. IEEE Transactions on Biomedical Engineering, 2006, 53(10):1943-1953.
[14] Schuller B, Steidl S, Batliner A, et al. The INTERSPEECH 2012 speaker trait challenge[C]//The 13th Annual Conference of the International Speech Communication Association. Portland, OR, USA:ISCA, 2012:254-257.
[15] Carmichael J. Classifying voice quality via pitch and spectral analysis[C]//Proceedings of the CUBE International Information Technology Conference. New York, USA:ACM, 2012:429-434.
[16] Kim J, Kumar N, Tsiartas A, et al. Intelligibility classification of pathological speech using fusion of multiple subsystems[C]//The 13th Annual Conference of the International Speech Communication Association. Portland, OR, USA:ISCA, 2012:534-537.
[17] Eberhart R C, Kennedy J. A new optimizer using particle swarm theory[C]//Proceedings of the sixth international symposium on micro machine and human science. Piscataway, NJ, USA:IEEE Press, 1995:39-43.