基于子带频谱质心特征的高效音频指纹检索

doi:10.16511/j.cnki.qhdxxb.2017.25.008

摘要
图/表
参考文献
相关文章
Metrics

全文: PDF(1727 KB)
输出: BibTeX | EndNote (RIS)

摘要关键音频检测是指从音频库中检索出查询样例，是音频检索的一种重要形式。该文针对传统关键音频检测方法在效率和鲁棒性上的不足分别在预处理、指纹提取以及检索部分进行了优化。在预处理阶段采用基于子带能量比的语音端点检测算法，并在窗函数选择和子带划分方法上进行了改善；在指纹提取阶段采用种子片段选取的方法，并将指纹提取方法改进为子带频谱质心法；在检索阶段通过设定命中次数门限以提高效率。实验结果表明：该文提出的改进系统在查全率、查准率以及抗噪能力提升的同时提高了检索效率，有效地提升了检索性能。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS

	作者相关文章
	孙甲松
	张菁芸
	杨毅

关键词 ：音频信息检索, 子带频谱质心特征, 指纹提取, 端点检测

Abstract：Key audio detection, an important form of audio retrieval, uses a query audio sample to search in an audio database but such searches are not very efficient or robust. This paper optimizes the pre-processing, fingerprint extraction and retrieval of the audio retrieval. The pre-processing uses endpoint detection based on the sub-band energy ratio with a modified window function and measurements of the sub-band divisions. The fingerprint extraction uses seed fragments and spectral sub-band centroids. The retrieval part uses a threshold for the hit counts to improve the efficiency. This system improves the precision and reduces the recall rate with good noise suppression. The retrieval efficiency and performance are effectively improved.

Key words： audio information retrieval spectral sub-band centroids fingerprint extraction endpoint detection

收稿日期: 2015-09-29 出版日期: 2017-04-15

ZTFLH:

TN912.3

引用本文:

孙甲松, 张菁芸, 杨毅. 基于子带频谱质心特征的高效音频指纹检索[J]. 清华大学学报（自然科学版）, 2017, 57(4): 382-387.
SUN Jiasong, ZHANG Jingyun, YANG Yi. Effective audio fingerprint retrieval based on the spectral sub-band centroid feature. Journal of Tsinghua University(Science and Technology), 2017, 57(4): 382-387.

链接本文:

http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2017.25.008 或 http://jst.tsinghuajournals.com/CN/Y2017/V57/I4/382

图1 改进系统的音频检索步骤

图2 端点检测改进前与改进后无噪语音和加噪语音的指纹差异比特数

图3 子带能量特征稳定区和信号能量的关系

图4 测试库的组成部分

图5 不同测试条件的检索性能

图6 不同SNR 数据在基线系统和改进系统中的检索性能

图7 不同变换的数据在基线系统和改进系统中检索准确率的下降程度

图8 不同测试条件下的检索时间

表1 基线系统和改进系统的整体性能结果

[1]	WANG Qiusheng, SUN Shenghe. A novel algorithm for embedding water marks into digital audio signals[J]. Acta Acustica, 2001, 26(5)： 464-467.
[2]	肖熙, 王竞千. 基于网格的语音关键词检索算法改进[J]. 清华大学学报(自然科学版), 2015, 55(5)： 508-513.XIAO Xi, WANG Jingqian. Improved lattice-based speech keyword spotting algorithm[J]. J Tsinghua Univ (Sci & Technol), 2015, 55(5)： 508-513. (in Chinese)
[3]	欧智坚, 罗骏, 谢达东, 等. 多功能语音/音频信息检索系统的研究与实现[C]//全国网络与信息安全技术研讨会. 北京：中国通信学会, 2004： 106-112.OU Zhijian, LUO Jun, XIE Dadong, et al. The research and implementation of multi-function voice/audio information retrieval system[C]//National Network and Information Security Technology Conference. Beijing： CIC, 2004： 106-112.
[4]	Smith G, Murase H, Kashino K. Quick audio retrieval using active search[C]//IEEE International Conference on Acoustics, Speech and Signal Processing. Seattle, USA： IEEE, 1998： 3777-3780.
[5]	Roy D, Malamud C. Speaker identification based text to audio alignment of an audio retrieval system[C]//IEEE International Conference on Acoustics, Speech and Signal Processing. Munich, Germany： IEEE, 1997： 1099-1102.
[6]	QIN Jing, LIU Xinyue, LIN Hongfei. Audio retrieval based on manifold ranking[C]//Sixth International Symposium on Parallel Architectures, Algorithms and Programming. Beijing, China： IEEE, 2014： 187-190.
[7]	Foote J. An overview of audio information retrieval[J]. Multimedia Systems, 1999, 7(1)： 2-11.
[8]	Wold E, Blum T, Keislar D, et al. Content-based classification search and retrieval of audio[J]. IEEE Multimedia Magazine, 1996, 3(3)： 27-36.
[9]	LIU Mingchun, WAN Chunru. A study on content based classification and retrieval of audio database[C]//IEEE Database Engineering and Applications Symposium. Grenoble, France： IEEE, 2001： 339-345.
[10]	Piamsa-Nga P, Alexandridis N A, Srakaew S, et al. In-clip search algorithm for content-based audio retrieval[C]//Proceedings of the Third International Conference on Computational Intelligence and Multimedia Applications. New Delhi, India： IEEE, 1999： 263-267.
[11]	Haitsma J, Kalker T. A highly robust audio fingerprinting system with an efficient search strategy[J]. Journal of New Music Research, 2003, 32(2)： 211-221.
[12]	WANG Avery, LI Chun. An industrial strength audio search algorithm[C]//Ismir 2003, International Conference on Music Information Retrieval, Baltimore. Washington, DC, USA： FEUP Edições, 2003： 7-13.
[13]	XU Haotian, OU Zhijian. Scalable discovery of audio fingerprint motifs in broadcast streams with determinantal point process based motif clustering[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2016, 24(5)： 978-989.
[14]	Chaudhary P, Hamid H, Kamel N, et al. A novel approach for segment level audio retrieval using singular value decomposition[C]//5th International Conference on Intelligent and Advanced Systems. Kuala Lumpur, Malaysia： IEEE, 2014： 1-5.
[15]	Dermatas E S, Fakotakis N D, Kokkinakis G K. Fast endpoint detection algorithm for isolated word recognition in office environment[C]//IEEE International Conference on Acoustic, Speech and Signal Processing. Salt Lake: IEEE, 1991: 733-736.
[16]	Haitsma J, Kalker T. Speed-change resistant audio fingerprinting using auto-correlation[C]//International Conference on Acoustics, Speech and Signal Processing. Hong Kong, China: IEEE, 2003, 4： 728-731.
[17]	Shen F, Shen C, Shi Q, et al. Inductive hashing on manifolds[C]//IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA: IEEE, 2013: 1562-1569.
[18]	ZHANG Xueyuan, HE Qianhua, LI Yanxiong, et al. An inverted index based audio retrieval method[J]. Journal of Electronics & Information Technology, 2012, 34(11)： 2561-2567.
[19]	Paliwalm K K. Spectral subband centroid features for speech recognition[C]//IEEE International Conference on Acoustics, Speech and Signal Processing. Seattle, USA: IEEE, 1998: 617-620.
[20]	Seo J S, Jin M, Lee S, et al. Audio fingerprinting based on normalized spectral subband centroids[C]//IEEE International Conference on Acoustics, Speech and Signal Processing. Philadelphia, USA: IEEE, 2005, 3: 213-216.

[1]	张雪英, 牛溥华, 高帆. 基于DNN-LSTM的VAD算法[J]. 清华大学学报（自然科学版）, 2018, 58(5): 509-515.
[2]	方义, 陈友元, 牟宏宇, 冯海泓. 基于双耳相干函数的鲁棒时延差估计与混响抑制算法[J]. 清华大学学报（自然科学版）, 2018, 58(5): 516-522.
[3]	宋鹏, 郑文明, 赵力. 基于子空间学习和特征选择融合的语音情感识别[J]. 清华大学学报（自然科学版）, 2018, 58(4): 347-351.
[4]	路文焕, 冯晓艳, HONDA Kiyoshi, 魏建国. 基于MRI研究相对舌体大小对个性化发音的影响[J]. 清华大学学报（自然科学版）, 2018, 58(4): 357-361.
[5]	张宇, 张鹏远, 颜永红. 基于注意力LSTM和多任务学习的远场语音识别[J]. 清华大学学报（自然科学版）, 2018, 58(3): 249-253.
[6]	苗晓晓, 张健, 索宏彬, 周若华, 颜永红. 应用于短时语音语种识别的时长扩展方法[J]. 清华大学学报（自然科学版）, 2018, 58(3): 254-259.
[7]	古力米热·依玛木, 姑丽加玛丽·麦麦提艾力, 玛依努尔·阿吾力提甫, 艾斯卡尔·艾木都拉. 维吾尔语韵律建模[J]. 清华大学学报（自然科学版）, 2017, 57(12): 1259-1264.
[8]	张健, 徐杰, 包秀国, 周若华, 颜永红. 应用于语种识别的加权音素对数似然比特征[J]. 清华大学学报（自然科学版）, 2017, 57(10): 1038-1041,1047.
[9]	王建荣, 高永春, 张句, 魏建国, 党建武. 基于Kinect辅助的机器人带噪语音识别[J]. 清华大学学报（自然科学版）, 2017, 57(9): 921-925.
[10]	梁维谦, 郑方, 陈朝阳, 陈高鋆. 基于GSPAP的子带自适应声反馈消除算法[J]. 清华大学学报（自然科学版）, 2017, 57(7): 707-712.
[11]	郭武, 张圣, 徐杰, 胡国平, 马啸空. 全变量系统和支持向量机结合的说话人确认[J]. 清华大学学报（自然科学版）, 2017, 57(3): 240-243.
[12]	甘振业, 陈浩, 杨鸿武. 结合EEMD与K-SVD字典训练的语音增强算法[J]. 清华大学学报（自然科学版）, 2017, 57(3): 286-292.
[13]	张鹏远, 计哲, 侯炜, 金鑫, 韩卫生. 小资源下语音识别算法设计与优化[J]. 清华大学学报（自然科学版）, 2017, 57(2): 147-152.
[14]	王建荣, 张句, 路文焕, 魏建国, 党建武. 机器人自身噪声环境下的自动语音识别[J]. 清华大学学报（自然科学版）, 2017, 57(2): 153-157.
[15]	张劲松, 王祖燕. 元音部分对中日被试汉语普通话鼻韵母知觉的影响[J]. 清华大学学报（自然科学版）, 2017, 57(2): 164-169.

Viewed

Full text

Abstract

Cited

Shared

Discussed