Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  百年期刊
Journal of Tsinghua University(Science and Technology)    2017, Vol. 57 Issue (1) : 18-23     DOI: 10.16511/j.cnki.qhdxxb.2017.21.004
COMPUTER SCIENCE AND TECHNOLOGY |
Spoken term detection based on DTW
HOU Jingyong1, XIE Lei1, YANG Peng1, XIAO Xiong2, LEUNG Cheung-Chi3, XU Haihua2, WANG Lei3, LV Hang1, MA Bin3, CHNG EngSiong2,4, LI Haizhou2,3,4
1. Shaanxi Provincial Key Laboratory of Speech and Image Information Processing(SAIIP), School of Computer Science, Northwestern Polytechnical University, Xi'an 710129, China;
2. Temasek Lab, Nanyang Technological University, Singapore;
3. Institute for Infocomm Research, A*STAR, Singapore;
4. School of Computer Engineering, Nanyang Technological University, Singapore
Download: PDF(1189 KB)  
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks    
Abstract  Spoken term detection (STD) for low resource languages has drawn much interest. A partial matching strategy based on phoneme boundaries is presented here to solve the fuzzy matching problem in query-by-example spoken term detection with dynamic time warping. A variety of features were used to validate the strategy on the QUESST 2014 dataset. Tests show that this strategy is not only quite effective for fuzzy match tasks T2 and T3 but also effective for the exact match task T1. This strategy has significantly improved performance in fusion tests.
Keywords spoken term detection      low resource languages      dynamic time warping      partial matching     
ZTFLH:  TP391.3  
Issue Date: 15 January 2017
Service
E-mail this article
E-mail Alert
RSS
Articles by authors
HOU Jingyong
XIE Lei
YANG Peng
XIAO Xiong
LEUNG Cheung-Chi
XU Haihua
WANG Lei
LV Hang
MA Bin
CHNG EngSiong
LI Haizhou
Cite this article:   
HOU Jingyong,XIE Lei,YANG Peng, et al. Spoken term detection based on DTW[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(1): 18-23.
URL:  
http://jst.tsinghuajournals.com/EN/10.16511/j.cnki.qhdxxb.2017.21.004     OR     http://jst.tsinghuajournals.com/EN/Y2017/V57/I1/18
  
  
  
  
  
  
[1] Mary P. Intelligence advanced research projects activity (IARPA)[Z/OL].[2015-03-20] http://www.iarpa.gov/index.php/research-programs/babel.
url: http://www.iarpa.gov/index.php/research-programs/babel.
[2] NIST. OpenKWS14 keyword search evaluation plan[Z/OL].[2015-03-20]. http://nist.gov/itl/iad/mig/upload/KWS14-evalplan-v11.pdf.
url: http://nist.gov/itl/iad/mig/upload/kws14-evalplan-v11.pdf.
[3] Tejedor J, Fapšo M, Sz ke I, et al. Comparison of methods for language-dependent and language-independent query-by-example spoken term detection[J].ACM Transactions on Information Systems, 2012,30(3):2317-2318.
[4] 杨鹏, 谢磊, 张艳宁. 低资源语言的无监督语音关键词检测技术综述[J]. 中国图象图形学报, 2015,20(2):0211-0218.YANG Peng, XIE Lei, ZHANG Yanning. Survey on unsupervised spoken term detection for low-resource languages[J].Journal of Image and Graphics, 2015,20(2):0211-0218. (in Chinese)
[5] Xu H, Yang P, Xiao X, et al. Language independent query-by-example spoken term detection using n-best phone sequences and partial matching[C]//2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Proceedings. Brisbane, QLD, Australia:IEEE Press, 2015:5191-5195.
[6] Yang P, Xu H, Xiao X, et al. The NNI query-by-example system for MediaEval 2014[C]//Working Notes Proceedings of the MediaEval 2014 Workshop. Barcelona, Spain:CEUR-WS, 2014, 1263.
[7] Yang P, Leung C C, Xie L, et al. Intrinsic spectral analysis based on temporal context features for query-by-example spoken term detection[C]//INTERSPEECH 2014 Proceedings. Singapore:IEEE, 2014:1722-1726.
[8] Zhang Y, Glass J R. Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams[C]//Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition and Understanding. Merano, Italy:IEEE, 2009:398-403.
[9] Anguera X, Rodriguez-Fuentes L J, Sz ke I, et al. Query by example search on speech at MediaEval 2014[C]//Working Notes Proceedings of the MediaEval 2014 Workshop. Barcelona, Spain:CEUR-WS, 2014, 1263.
[10] Sz ke I, Skácel M, Burget L. BUT QUESST 2014 system description[C]//Working Notes Proceedings of the MediaEval 2014 Workshop. Barcelona, Spain:CEUR-WS, 2014, 1263.
[11] 张卫强, 宋贝利, 蔡猛, 等. 基于音素后验概率的样例关键词检测算法[J]. 天津大学学报:自然科学与工程技术版, 2015,48(9):757-760.ZHANG Weiqiang, SONG Beili, CAI Meng, et al. A query-by-example spoken term detection method based on phonetic posteriorgram[J].Journal of Tianjin University Science and Technology, 2015,48(9):757-760. (in Chinese)
[12] Proença J, Veiga A, Perdigão F. The SPL-IT query by example search on speech system for MediaEval 2014[C]//Working Notes Proceedings of the MediaEval 2014 Workshop. Barcelona, Spain:CEUR-WS, 2014.
[13] Müller M. Information Retrieval for Music and Motion[M]. Berlin:Springer, 2007.
[14] Anguera X. Speaker independent discriminant feature extraction for acoustic pattern-matching[C]//2012 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2012) Proceedings. Kyoto, Japan:IEEE Press, 2012:485-488.
[15] Muscariello A, Gravier G, Bimbot F. Audio keyword extraction by unsupervised word discovery[C]//Proceedings of the 10th Annual Conference of the International Speech Communication Association, INTERSPEECH 2009. Brighton, UK:IEEE Press. 2009:2843-2846.
[16] Schwarz P, Matejka P, Cernocky J. Hierarchical structures of neural networks for phoneme recognition[C]//2006 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2006. Toulouse, France:IEEE Press, 2006:I325-I328.
[17] Wang H, Lee T, Leung C C. Unsupervised spoken term detection with acoustic segment model[C]//14th Annual International Conference on Speech Database and Assessments, Oriental COCOSDA 2011. Hsinchu, Taiwan, China:IEEE Press, 2011:106-111.
[18] Wang H, Lee T, Leung C C, et al. A graph-based Gaussian component clustering approach to unsupervised acoustic modeling[C]//INTERSPEECH 2014. Singapore:IEEE Press, 2014:875-879.
[19] Zhang Y, Chuangsuwanich E, Glass J R. Extracting deep neural network bottleneck features using low-rank matrix factorization[C]//2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014. Florence, Italy:IEEE Press, 2014:185-189.
[20] Gupta V, Ajmera J, Kumar A, et al. A language independent approach to audio search[C]//Conference of the International Speech Communication Association (ICASSP) Proceedings. Florence, Italy:IEEE Press, 2011:1125-1128.
[21] Rodriguez-Fuentes L J, Penagarikano M. MediaEval 2013 spoken web search task:system performance measures[Z/OL].[2015-03-20]. http://gtts.ehu.es/gtts/NT/fulltext/rodriguezmediaeval13.pdf.
url: http://gtts.ehu.es/gtts/nt/fulltext/rodriguezmediaeval13.pdf.
[22] Fiscus J G, Ajot J, Garofolo J S, et al. Results of the 2006 spoken term detection evaluation[C]//Proc SIGIR 2007. Amsterdam, Netherlands:ACM, 2007:51-57.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
Copyright © Journal of Tsinghua University(Science and Technology), All Rights Reserved.
Powered by Beijing Magtech Co. Ltd