Effective audio fingerprint retrieval based on the spectral sub-band centroid feature

doi:10.16511/j.cnki.qhdxxb.2017.25.008

Abstract
Figures/Tables
References
Related Articles
Metrics

Download: PDF(1727 KB)
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks

Abstract Key audio detection, an important form of audio retrieval, uses a query audio sample to search in an audio database but such searches are not very efficient or robust. This paper optimizes the pre-processing, fingerprint extraction and retrieval of the audio retrieval. The pre-processing uses endpoint detection based on the sub-band energy ratio with a modified window function and measurements of the sub-band divisions. The fingerprint extraction uses seed fragments and spectral sub-band centroids. The retrieval part uses a threshold for the hit counts to improve the efficiency. This system improves the precision and reduces the recall rate with good noise suppression. The retrieval efficiency and performance are effectively improved.

Keywords audio information retrieval spectral sub-band centroids fingerprint extraction endpoint detection

ZTFLH:

TN912.3

Issue Date: 15 April 2017

	Service

	E-mail this article
	E-mail Alert
	RSS
	Articles by authors

	SUN Jiasong
	ZHANG Jingyun
	YANG Yi

Cite this article:

SUN Jiasong,ZHANG Jingyun,YANG Yi. Effective audio fingerprint retrieval based on the spectral sub-band centroid feature[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(4): 382-387.

URL:

http://jst.tsinghuajournals.com/EN/10.16511/j.cnki.qhdxxb.2017.25.008 OR http://jst.tsinghuajournals.com/EN/Y2017/V57/I4/382

[1]	WANG Qiusheng, SUN Shenghe. A novel algorithm for embedding water marks into digital audio signals[J]. Acta Acustica, 2001, 26(5)： 464-467.
[2]	肖熙, 王竞千. 基于网格的语音关键词检索算法改进[J]. 清华大学学报(自然科学版), 2015, 55(5)： 508-513.XIAO Xi, WANG Jingqian. Improved lattice-based speech keyword spotting algorithm[J]. J Tsinghua Univ (Sci & Technol), 2015, 55(5)： 508-513. (in Chinese) url: http://dx.doi.org/nghua Univ (Sci
[3]	欧智坚, 罗骏, 谢达东, 等. 多功能语音/音频信息检索系统的研究与实现[C]//全国网络与信息安全技术研讨会. 北京：中国通信学会, 2004： 106-112.OU Zhijian, LUO Jun, XIE Dadong, et al. The research and implementation of multi-function voice/audio information retrieval system[C]//National Network and Information Security Technology Conference. Beijing： CIC, 2004： 106-112.
[4]	Smith G, Murase H, Kashino K. Quick audio retrieval using active search[C]//IEEE International Conference on Acoustics, Speech and Signal Processing. Seattle, USA： IEEE, 1998： 3777-3780.
[5]	Roy D, Malamud C. Speaker identification based text to audio alignment of an audio retrieval system[C]//IEEE International Conference on Acoustics, Speech and Signal Processing. Munich, Germany： IEEE, 1997： 1099-1102.
[6]	QIN Jing, LIU Xinyue, LIN Hongfei. Audio retrieval based on manifold ranking[C]//Sixth International Symposium on Parallel Architectures, Algorithms and Programming. Beijing, China： IEEE, 2014： 187-190.
[7]	Foote J. An overview of audio information retrieval[J]. Multimedia Systems, 1999, 7(1)： 2-11.
[8]	Wold E, Blum T, Keislar D, et al. Content-based classification search and retrieval of audio[J]. IEEE Multimedia Magazine, 1996, 3(3)： 27-36.
[9]	LIU Mingchun, WAN Chunru. A study on content based classification and retrieval of audio database[C]//IEEE Database Engineering and Applications Symposium. Grenoble, France： IEEE, 2001： 339-345.
[10]	Piamsa-Nga P, Alexandridis N A, Srakaew S, et al. In-clip search algorithm for content-based audio retrieval[C]//Proceedings of the Third International Conference on Computational Intelligence and Multimedia Applications. New Delhi, India： IEEE, 1999： 263-267.
[11]	Haitsma J, Kalker T. A highly robust audio fingerprinting system with an efficient search strategy[J]. Journal of New Music Research, 2003, 32(2)： 211-221.
[12]	WANG Avery, LI Chun. An industrial strength audio search algorithm[C]//Ismir 2003, International Conference on Music Information Retrieval, Baltimore. Washington, DC, USA： FEUP Edições, 2003： 7-13.
[13]	XU Haotian, OU Zhijian. Scalable discovery of audio fingerprint motifs in broadcast streams with determinantal point process based motif clustering[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2016, 24(5)： 978-989.
[14]	Chaudhary P, Hamid H, Kamel N, et al. A novel approach for segment level audio retrieval using singular value decomposition[C]//5th International Conference on Intelligent and Advanced Systems. Kuala Lumpur, Malaysia： IEEE, 2014： 1-5.
[15]	Dermatas E S, Fakotakis N D, Kokkinakis G K. Fast endpoint detection algorithm for isolated word recognition in office environment[C]//IEEE International Conference on Acoustic, Speech and Signal Processing. Salt Lake: IEEE, 1991: 733-736.
[16]	Haitsma J, Kalker T. Speed-change resistant audio fingerprinting using auto-correlation[C]//International Conference on Acoustics, Speech and Signal Processing. Hong Kong, China: IEEE, 2003, 4： 728-731.
[17]	Shen F, Shen C, Shi Q, et al. Inductive hashing on manifolds[C]//IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA: IEEE, 2013: 1562-1569.
[18]	ZHANG Xueyuan, HE Qianhua, LI Yanxiong, et al. An inverted index based audio retrieval method[J]. Journal of Electronics & Information Technology, 2012, 34(11)： 2561-2567. url: http://dx.doi.org/al of Electronics
[19]	Paliwalm K K. Spectral subband centroid features for speech recognition[C]//IEEE International Conference on Acoustics, Speech and Signal Processing. Seattle, USA: IEEE, 1998: 617-620.
[20]	Seo J S, Jin M, Lee S, et al. Audio fingerprinting based on normalized spectral subband centroids[C]//IEEE International Conference on Acoustics, Speech and Signal Processing. Philadelphia, USA: IEEE, 2005, 3: 213-216.

[1]	ZHANG Xueying, NIU Puhua, GAO Fan. DNN-LSTM based VAD algorithm[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(5): 509-515.
[2]	FANG Yi, CHEN Youyuan, MOU Hongyu, FENG Haihong. A robust time-delay estimation and dereverberation algorithm based on the coherence function[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(5): 516-522.
[3]	SONG Peng, ZHENG Wenming, ZHAO Li. Joint subspace learning and feature selection method for speech emotion recognition[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(4): 347-351.
[4]	LU Wenhuan, FENG Xiaoyan, HONDA Kiyoshi, WEI Jianguo. MRI analyses of the effects of relative tongue size on individual articulatory differences[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(4): 357-361.
[5]	ZHANG Yu, ZHANG Pengyuan, YAN Yonghong. Long short-term memory with attention and multitask learning for distant speech recognition[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(3): 249-253.
[6]	MIAO Xiaoxiao, ZHANG Jian, SUO Hongbin, ZHOU Ruohua, YAN Yonghong. Expanding the length of short utterances for short-duration language recognition[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(3): 254-259.
[7]	Gulmire Imam, Guljamal Mamateli, Maynur Ablitip, Askar Hamdulla. Prosody modeling for Uyghur TTS[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(12): 1259-1264.
[8]	ZHANG Jian, XU Jie, BAO Xiuguo, ZHOU Ruohua, YAN Yonghong. Weighted phone log-likelihood ratio feature for spoken language recognition[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(10): 1038-1041,1047.
[9]	WANG Jianrong, GAO Yongchun, ZHANG Ju, WEI Jianguo, DANG Jianwu. Automatic speech recognition by a Kinect sensor for a robot under ego noises[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(9): 921-925.
[10]	LIANG Weiqian, ZHENG Fang, CHEN Chaoyang, CHEN Gaojun. GSPAP based sub-band adaptive feedback cancellation algorithm[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(7): 707-712.
[11]	GUO Wu, ZHANG Sheng, XU Jie, HU Guoping, MA Xiaokong. Speaker verification based on SVM and total variability[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(3): 240-243.
[12]	GAN Zhenye, CHEN Hao, YANG Hongwu. Speech enhancement algorithm that combines EEMD and K-SVD dictionary training[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(3): 286-292.
[13]	ZHANG Pengyuan, JI Zhe, HOU Wei, JIN Xin, HAN Weisheng. Design and optimization of a low resource speech recognition system[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(2): 147-152.
[14]	WANG Jianrong, ZHANG Ju, LU Wenhuan, WEI Jianguo, DANG Jianwu. Automatic speech recognition with robot noise[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(2): 153-157.
[15]	ZHANG Jinsong, WANG Zuyan. Influences of vowels on the perception of nasal codas in Mandarin for Japanese and Chinese natives[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(2): 164-169.

Viewed

Full text

Abstract

Cited

Shared

Discussed