病理语音的S变换特征

doi:10.16511/j.cnki.qhdxxb.2016.21.042

摘要
图/表
参考文献
相关文章
Metrics

全文: PDF(1674 KB)
输出: BibTeX | EndNote (RIS)

摘要病理语音具有强烈的非平稳性和突变性特点，较难分析。S变换具有良好的时频分辨率和时频定位能力。该文将S变换与人耳听觉的Mel特性结合，提出一种能够突出发声器官病变的病理语音特征MSCC（Mel S-transform cepstrum coefficients）。在NCSC语料库上，通过与经典语音倒谱特征MFCC （Mel frequency cepstrum coefficients）和当前常用声学特征的对比，表明MSCC特征对语音中动态、快变的病理信息具有更强的刻画能力。此外，选用F-Score方法对特征进行评价和采用粒子群算法进行特征筛选，MSCC表现出了更好的分类性能。可见，MSCC特征可以为临床诊断提供病理语音的高精准分析。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS

	作者相关文章
	李海峰
	房春英
	马琳
	张满彩
	孙佳音

关键词 ：病理语音, S变换, Mel倒谱, MSCC特征

Abstract：Pathological speech is difficult to analyze because it is non-stationary and mutative. The study combines the S transform, which has good time-frequency resolution and time-frequency positioning capability with the human auditory Mel characteristics to calculate Mel S-transform cepstrum coefficients (MSCC) which highlight vocal organ pathological lesions. The MSCC are compared with the classical Mel frequency cepstrum coefficients (MFCC) and the common acoustic characteristics in the NCSC corpus to show that the MSCC are more able to portray the dynamics and to quickly identify pathological speech information. In addition, the MSCC also give classification performance based on the F-Score method with the particle swarm optimization algorithm for feature selection. Therefore, the MSCC provide accurate analyses of pathological speech characteristics for clinical diagnosis.

Key words： pathological speech S transform Mel cepstrum Mel S-transform cepstrum coefficients (MSCC) feature

收稿日期: 2015-07-10 出版日期: 2016-07-15

ZTFLH:

TN912.34

基金资助:国家自然科学基金面上资助项目（61171186，61271345）；语言语音教育部-微软重点实验室开放基金资助项目（HIT.KLOF.20110XX）；中央高校基本科研业务费专项资金（HIT.NSRIF.2012047）；黑龙江教育厅科学技术研究项目（12533051）；黑龙江科技大学优秀青年才俊培养资助项目（Q20130106）

引用本文:

李海峰, 房春英, 马琳, 张满彩, 孙佳音. 病理语音的S变换特征[J]. 清华大学学报（自然科学版）, 2016, 56(7): 765-771.
LI Haifeng, FANG Chunying, MA Lin, ZHANG Mancai, SUN Jiayin. S transform feature for pathological speech. Journal of Tsinghua University(Science and Technology), 2016, 56(7): 765-771.

链接本文:

http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2016.21.042 或 http://jst.tsinghuajournals.com/CN/Y2016/V56/I7/765

图１　S变换Gauss窗函数不同频率的形状示意图

图２　一段病理语音在不同变换下的时频分析图

图３　基于S变换的语音特征MSCC示意图

表１　NCSC语料分布情况

表２　基于MSCC和MFCC特征识别结果对比

图４　MSCC与MFCC对比图

表３　病理声音BAFS的构造

表４　基于MSCC和BAFP的实验结果对比

表５　基于MSCC＋BAFP和PSOＧFeatures的实验结果对比

图５　降维前后MSCC与BAFP在特征集中被保留数目及所占比重示意图

[1] Hernandez-Espinosa C, Gomez-Vilda P, Godino-Llorente J I, et al. Diagnosis of vocal and voice disorders by the speech signal[C]//Proceedings of the International Joint Conference on Neural Networks. Piscataway, NJ, USA:IEEE Press, 2000:253-258.
[2] 彭策. 基于声学与小波熵及自回归模型的病态嗓音诊断新方法研究[D]. 天津:天津大学, 2008. PENG Ce. Study on the Novel Method of Pathological Voice Diagnosis Based on Acoustics, Wavelet Entropy and Auto-Regressive model[D]. Tianjin:Tianjin university, 2008. (in Chinese)
[3] 李宁. 基于声学参数和支持向量机的病理嗓音分类研究[D]. 上海:华东师范大学, 2013. LI Ning. Automatic Classification for Pathological Voice based on Acoustic Parameters and SVM[D]. Shanghai:East China Normal University, 2013. (in Chinese)
[4] 张涛. 基于语音特征的帕金森病可视化诊断方法研究[D]. 秦皇岛:燕山大学, 2012. ZHANG Tao. Visual Diagnostic Method for Parkinson's Disease based on Speech Features[D]. Qinhuangdao:Yanshan University, 2012. (in Chinese)
[5] Godino-Llorente J I, Gomez-Vilda P. Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors[J]. Biomedical Engineering, IEEE Transactions on, 2004, 51(2):380-384.
[6] Shama K, Cholayya N U. Study of harmonics-to-noise ratio and critical-band energy spectrum of speech as acoustic indicators of laryngeal and voice pathology[J]. EURASIP Journal on Applied Signal Processing, 2007(1):1-10.
[7] Gelzinis A, Verikas A, Bacauskiene M. Automated speech analysis applied to laryngeal disease categorization[J]. Computer Methods and Programs in Biomedicine, 2008, 91(1):36-47.
[8] Zhou X, Garcia-Romero D, Mesgarani N, et al. Automatic intelligibility assessment of pathologic speech in head and neck cancer based on auditory-inspired spectro-temporal modulations[C]//The 13th Annual Conference of the International Speech Communication Association. Portland, OR, USA:ISCA, 2012:542-545.
[9] Clapham R P, van der Molen L, van Son R, et al. NKI-CCRT corpus-speech intelligibility before and after advanced head and neck cancer treated with concomitant chemoradiotherapy[C]//Proceedings of the Eighth International Conference on Language Resources and Evaluation, Istanbul, Turkey:ELRA, 2012:3350-3355.
[10] Stockwell R G, Mansinha L, Lowe R P. Localization of the complex spectrum:the S transform[J]. IEEE Transactions on Signal Processing, 1996, 44(4):998-1001.
[11] Ventosa S, Simon C, Schimmel M, et al. The S-transform from a wavelet point of view[J]. IEEE Transactions on Signal Processing, 2008, 56(7):2771-2780.
[12] Kazemi K, Amirian M, Dehghani M J. The S-transform using a new window to improve frequency and time resolutions[J]. Signal, Image and Video Processing, 2014, 8(3):533-541.
[13] Godino-Llorente J I, Gomez-Vilda P, Blanco-Velasco M. Dimensionality reduction of a pathological voice quality assessment system based on Gaussian mixture models and short-term cepstral parameters[J]. IEEE Transactions on Biomedical Engineering, 2006, 53(10):1943-1953.
[14] Schuller B, Steidl S, Batliner A, et al. The INTERSPEECH 2012 speaker trait challenge[C]//The 13th Annual Conference of the International Speech Communication Association. Portland, OR, USA:ISCA, 2012:254-257.
[15] Carmichael J. Classifying voice quality via pitch and spectral analysis[C]//Proceedings of the CUBE International Information Technology Conference. New York, USA:ACM, 2012:429-434.
[16] Kim J, Kumar N, Tsiartas A, et al. Intelligibility classification of pathological speech using fusion of multiple subsystems[C]//The 13th Annual Conference of the International Speech Communication Association. Portland, OR, USA:ISCA, 2012:534-537.
[17] Eberhart R C, Kennedy J. A new optimizer using particle swarm theory[C]//Proceedings of the sixth international symposium on micro machine and human science. Piscataway, NJ, USA:IEEE Press, 1995:39-43.

[1]	张雪英, 牛溥华, 高帆. 基于DNN-LSTM的VAD算法[J]. 清华大学学报（自然科学版）, 2018, 58(5): 509-515.
[2]	方义, 陈友元, 牟宏宇, 冯海泓. 基于双耳相干函数的鲁棒时延差估计与混响抑制算法[J]. 清华大学学报（自然科学版）, 2018, 58(5): 516-522.
[3]	张宇, 张鹏远, 颜永红. 基于注意力LSTM和多任务学习的远场语音识别[J]. 清华大学学报（自然科学版）, 2018, 58(3): 249-253.
[4]	王建荣, 高永春, 张句, 魏建国, 党建武. 基于Kinect辅助的机器人带噪语音识别[J]. 清华大学学报（自然科学版）, 2017, 57(9): 921-925.
[5]	郭武, 张圣, 徐杰, 胡国平, 马啸空. 全变量系统和支持向量机结合的说话人确认[J]. 清华大学学报（自然科学版）, 2017, 57(3): 240-243.
[6]	张鹏远, 计哲, 侯炜, 金鑫, 韩卫生. 小资源下语音识别算法设计与优化[J]. 清华大学学报（自然科学版）, 2017, 57(2): 147-152.
[7]	王建荣, 张句, 路文焕, 魏建国, 党建武. 机器人自身噪声环境下的自动语音识别[J]. 清华大学学报（自然科学版）, 2017, 57(2): 153-157.
[8]	郭武, 马啸空. 复杂噪声场景下的活动语音检测方法[J]. 清华大学学报（自然科学版）, 2016, 56(11): 1190-1195.
[9]	邢安昊, 张鹏远, 潘接林, 颜永红. 基于SVD的DNN裁剪方法和重训练[J]. 清华大学学报（自然科学版）, 2016, 56(7): 772-776.

Viewed

Full text

Abstract

Cited

Shared

Discussed