Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  百年期刊
Journal of Tsinghua University(Science and Technology)    2017, Vol. 57 Issue (4) : 382-387     DOI: 10.16511/j.cnki.qhdxxb.2017.25.008
ELECTRICAL ENGINEERING |
Effective audio fingerprint retrieval based on the spectral sub-band centroid feature
SUN Jiasong, ZHANG Jingyun, YANG Yi
Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
Download: PDF(1727 KB)  
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks    
Abstract  Key audio detection, an important form of audio retrieval, uses a query audio sample to search in an audio database but such searches are not very efficient or robust. This paper optimizes the pre-processing, fingerprint extraction and retrieval of the audio retrieval. The pre-processing uses endpoint detection based on the sub-band energy ratio with a modified window function and measurements of the sub-band divisions. The fingerprint extraction uses seed fragments and spectral sub-band centroids. The retrieval part uses a threshold for the hit counts to improve the efficiency. This system improves the precision and reduces the recall rate with good noise suppression. The retrieval efficiency and performance are effectively improved.
Keywords audio information retrieval      spectral sub-band centroids      fingerprint extraction      endpoint detection     
ZTFLH:  TN912.3  
Issue Date: 15 April 2017
Service
E-mail this article
E-mail Alert
RSS
Articles by authors
SUN Jiasong
ZHANG Jingyun
YANG Yi
Cite this article:   
SUN Jiasong,ZHANG Jingyun,YANG Yi. Effective audio fingerprint retrieval based on the spectral sub-band centroid feature[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(4): 382-387.
URL:  
http://jst.tsinghuajournals.com/EN/10.16511/j.cnki.qhdxxb.2017.25.008     OR     http://jst.tsinghuajournals.com/EN/Y2017/V57/I4/382
  
  
  
  
  
  
  
  
  
[1] WANG Qiusheng, SUN Shenghe. A novel algorithm for embedding water marks into digital audio signals[J]. Acta Acustica, 2001, 26(5): 464-467.
[2] 肖熙, 王竞千. 基于网格的语音关键词检索算法改进[J]. 清华大学学报(自然科学版), 2015, 55(5): 508-513.XIAO Xi, WANG Jingqian. Improved lattice-based speech keyword spotting algorithm[J]. J Tsinghua Univ (Sci & Technol), 2015, 55(5): 508-513. (in Chinese)
url: http://dx.doi.org/nghua Univ (Sci
[3] 欧智坚, 罗骏, 谢达东, 等. 多功能语音/音频信息检索系统的研究与实现[C]//全国网络与信息安全技术研讨会. 北京: 中国通信学会, 2004: 106-112.OU Zhijian, LUO Jun, XIE Dadong, et al. The research and implementation of multi-function voice/audio information retrieval system[C]//National Network and Information Security Technology Conference. Beijing: CIC, 2004: 106-112.
[4] Smith G, Murase H, Kashino K. Quick audio retrieval using active search[C]//IEEE International Conference on Acoustics, Speech and Signal Processing. Seattle, USA: IEEE, 1998: 3777-3780.
[5] Roy D, Malamud C. Speaker identification based text to audio alignment of an audio retrieval system[C]//IEEE International Conference on Acoustics, Speech and Signal Processing. Munich, Germany: IEEE, 1997: 1099-1102.
[6] QIN Jing, LIU Xinyue, LIN Hongfei. Audio retrieval based on manifold ranking[C]//Sixth International Symposium on Parallel Architectures, Algorithms and Programming. Beijing, China: IEEE, 2014: 187-190.
[7] Foote J. An overview of audio information retrieval[J]. Multimedia Systems, 1999, 7(1): 2-11.
[8] Wold E, Blum T, Keislar D, et al. Content-based classification search and retrieval of audio[J]. IEEE Multimedia Magazine, 1996, 3(3): 27-36.
[9] LIU Mingchun, WAN Chunru. A study on content based classification and retrieval of audio database[C]//IEEE Database Engineering and Applications Symposium. Grenoble, France: IEEE, 2001: 339-345.
[10] Piamsa-Nga P, Alexandridis N A, Srakaew S, et al. In-clip search algorithm for content-based audio retrieval[C]//Proceedings of the Third International Conference on Computational Intelligence and Multimedia Applications. New Delhi, India: IEEE, 1999: 263-267.
[11] Haitsma J, Kalker T. A highly robust audio fingerprinting system with an efficient search strategy[J]. Journal of New Music Research, 2003, 32(2): 211-221.
[12] WANG Avery, LI Chun. An industrial strength audio search algorithm[C]//Ismir 2003, International Conference on Music Information Retrieval, Baltimore. Washington, DC, USA: FEUP Edições, 2003: 7-13.
[13] XU Haotian, OU Zhijian. Scalable discovery of audio fingerprint motifs in broadcast streams with determinantal point process based motif clustering[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2016, 24(5): 978-989.
[14] Chaudhary P, Hamid H, Kamel N, et al. A novel approach for segment level audio retrieval using singular value decomposition[C]//5th International Conference on Intelligent and Advanced Systems. Kuala Lumpur, Malaysia: IEEE, 2014: 1-5.
[15] Dermatas E S, Fakotakis N D, Kokkinakis G K. Fast endpoint detection algorithm for isolated word recognition in office environment[C]//IEEE International Conference on Acoustic, Speech and Signal Processing. Salt Lake: IEEE, 1991: 733-736.
[16] Haitsma J, Kalker T. Speed-change resistant audio fingerprinting using auto-correlation[C]//International Conference on Acoustics, Speech and Signal Processing. Hong Kong, China: IEEE, 2003, 4: 728-731.
[17] Shen F, Shen C, Shi Q, et al. Inductive hashing on manifolds[C]//IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA: IEEE, 2013: 1562-1569.
[18] ZHANG Xueyuan, HE Qianhua, LI Yanxiong, et al. An inverted index based audio retrieval method[J]. Journal of Electronics & Information Technology, 2012, 34(11): 2561-2567.
url: http://dx.doi.org/al of Electronics
[19] Paliwalm K K. Spectral subband centroid features for speech recognition[C]//IEEE International Conference on Acoustics, Speech and Signal Processing. Seattle, USA: IEEE, 1998: 617-620.
[20] Seo J S, Jin M, Lee S, et al. Audio fingerprinting based on normalized spectral subband centroids[C]//IEEE International Conference on Acoustics, Speech and Signal Processing. Philadelphia, USA: IEEE, 2005, 3: 213-216.
[1] ZHANG Xueying, NIU Puhua, GAO Fan. DNN-LSTM based VAD algorithm[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(5): 509-515.
[2] FANG Yi, CHEN Youyuan, MOU Hongyu, FENG Haihong. A robust time-delay estimation and dereverberation algorithm based on the coherence function[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(5): 516-522.
[3] SONG Peng, ZHENG Wenming, ZHAO Li. Joint subspace learning and feature selection method for speech emotion recognition[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(4): 347-351.
[4] LU Wenhuan, FENG Xiaoyan, HONDA Kiyoshi, WEI Jianguo. MRI analyses of the effects of relative tongue size on individual articulatory differences[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(4): 357-361.
[5] ZHANG Yu, ZHANG Pengyuan, YAN Yonghong. Long short-term memory with attention and multitask learning for distant speech recognition[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(3): 249-253.
[6] MIAO Xiaoxiao, ZHANG Jian, SUO Hongbin, ZHOU Ruohua, YAN Yonghong. Expanding the length of short utterances for short-duration language recognition[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(3): 254-259.
[7] Gulmire Imam, Guljamal Mamateli, Maynur Ablitip, Askar Hamdulla. Prosody modeling for Uyghur TTS[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(12): 1259-1264.
[8] ZHANG Jian, XU Jie, BAO Xiuguo, ZHOU Ruohua, YAN Yonghong. Weighted phone log-likelihood ratio feature for spoken language recognition[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(10): 1038-1041,1047.
[9] WANG Jianrong, GAO Yongchun, ZHANG Ju, WEI Jianguo, DANG Jianwu. Automatic speech recognition by a Kinect sensor for a robot under ego noises[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(9): 921-925.
[10] LIANG Weiqian, ZHENG Fang, CHEN Chaoyang, CHEN Gaojun. GSPAP based sub-band adaptive feedback cancellation algorithm[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(7): 707-712.
[11] GUO Wu, ZHANG Sheng, XU Jie, HU Guoping, MA Xiaokong. Speaker verification based on SVM and total variability[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(3): 240-243.
[12] GAN Zhenye, CHEN Hao, YANG Hongwu. Speech enhancement algorithm that combines EEMD and K-SVD dictionary training[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(3): 286-292.
[13] ZHANG Pengyuan, JI Zhe, HOU Wei, JIN Xin, HAN Weisheng. Design and optimization of a low resource speech recognition system[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(2): 147-152.
[14] WANG Jianrong, ZHANG Ju, LU Wenhuan, WEI Jianguo, DANG Jianwu. Automatic speech recognition with robot noise[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(2): 153-157.
[15] ZHANG Jinsong, WANG Zuyan. Influences of vowels on the perception of nasal codas in Mandarin for Japanese and Chinese natives[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(2): 164-169.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
Copyright © Journal of Tsinghua University(Science and Technology), All Rights Reserved.
Powered by Beijing Magtech Co. Ltd