Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  百年期刊
Journal of Tsinghua University(Science and Technology)    2017, Vol. 57 Issue (9) : 921-925     DOI: 10.16511/j.cnki.qhdxxb.2017.26.041
COMPUTER SCIENCE AND TECHNOLOGY |
Automatic speech recognition by a Kinect sensor for a robot under ego noises
WANG Jianrong1, GAO Yongchun1, ZHANG Ju1, WEI Jianguo2, DANG Jianwu1
1. School of Computer Science and Technology, Tianjin University, Tianjin 300350, China;
2. School of Computer Software, Tianjin University, Tianjin 300350, China
Download: PDF(1454 KB)  
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks    
Abstract  Audio-visual integration can effectively improve automatic speech recognition for robots under ego noises. However, head rotations lips movement differences, camera-subject distance and lighting variations degrade the automatic speech recognition (ASR) accuracy. This paper describes robot with a Kinect sensor in a multi-modal system. The Kinect provides 3-D data and visual information. The lip profiles are rebuilt using the 3-D data to get more accurate information from the video. Different fusion methods were investigated to incorporate the available multimodal information. Tests under ego noises of the robot demonstrate that the multi-modal system is superior to traditional automatic audio and audio-visual speech recognition with improved speech recognition robustness.
Keywords humanoid robot      ego noises      automatic speech recognition      Kinect multi-sensor      multi-modal system     
ZTFLH:  TP242  
  TN912.34  
Issue Date: 15 September 2017
Service
E-mail this article
E-mail Alert
RSS
Articles by authors
WANG Jianrong
GAO Yongchun
ZHANG Ju
WEI Jianguo
DANG Jianwu
Cite this article:   
WANG Jianrong,GAO Yongchun,ZHANG Ju, et al. Automatic speech recognition by a Kinect sensor for a robot under ego noises[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(9): 921-925.
URL:  
http://jst.tsinghuajournals.com/EN/10.16511/j.cnki.qhdxxb.2017.26.041     OR     http://jst.tsinghuajournals.com/EN/Y2017/V57/I9/921
  
  
  
  
  
  
  
  
[1] Breazeal C L. Designing Sociable Robots[M]. Massachusetts:MIT Press, 2004.
[2] Yamamoto S, Nakadai K, Tsujino H, et al. Assessment of general applicability of robot audition system by recognizing three simultaneous speeches[C]//2004 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway, NJ, USA:IEEE Press, 2004:2111-2116.
[3] Luo Z, Zhao J. Speech recognition and its application in voice-based robot control system[C]//2004 International Conference on Intelligent Mechatronics and Automation. Piscataway, NJ, USA:IEEE Press, 2004:960-963.
[4] Ince G, Nakadai K, Rodemann T, et al. Ego noise suppression of a robot using template subtraction[C]//2009 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway, NJ, USA:IEEE Press, 2009:199-204.
[5] Ince G, Nakadai K, Rodemann T, et al. Assessment of single-channel ego noise estimation methods[J]. IEEE/RSJ International Conference on Intelligent Robots & Systems, 2011, 32(14):106-111.
url: http://dx.doi.org/SJ International Conference on Intelligent Robots
[6] Brandstein M, Ward D. Microphone Arrays:Signal Processing Techniques and Applications[M]. Berlin:Springer Science & Business Media, 2001.
[7] Valin J M, Rouat J, Michaud F. Enhanced robot audition based on microphone array source separation with post-filter[C]//2004 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway, NJ, USA:IEEE Press, 2004:2123-2128.
[8] Yamamoto S, Nakadai K, Nakano M, et al. Real-time robot audition system that recognizes simultaneous speech in the real world[C]//2006 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway, NJ, USA:IEEE Press, 2006:5333-5338.
[9] Cohen I, Berdugo B. Speech enhancement for non-stationary noise environments[J]. Signal Processing, 2001, 81(11):2403-2418.
[10] Cohen I, Berdugo B. Noise estimation by minima controlled recursive averaging for robust speech enhancement[J]. Signal Processing Letters, IEEE, 2002, 9(1):12-15.
[11] Wang J, Zhang J, Wei J, et al. Automatic speech recognition under robot ego noises[C]//20149th International Symposium on Chinese Spoken Language. Piscataway, NJ, USA:IEEE Press, 2014:377.
[12] Neti C, Potamianos G, Luettin J, et al. Audio Visual Speech Recognition[R]. Martigny:IDIAP, 2000.
[13] Potamianos G, Graf H P, Cosatto E. An image transform approach for HMM based automatic lipreading[C]//1998 International Conference on Image Processing. Piscataway, NJ, USA:IEEE Press, 1998:173-177.
[14] Shin J, Lee J, Kim D. Real-time lip reading system for isolated Korean word recognition[J]. Pattern Recognition, 2011, 44(3):559-571.
[15] Koiwa T, Nakadai K, Imura J. Coarse speech recognition by audio-visual integration based on missing feature theory[C]//2007 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway, NJ, USA:IEEE Press, 2007:1751-1756.
[16] Yoshida T, Nakadai K, Okuno H G. Automatic speech recognition improved by two-layered audio-visual integration for robot audition[C]//20099th IEEE-RAS International Conference on Humanoid Robots. Piscataway, NJ, USA:IEEE Press, 2009:604-609.
[17] Yoshida T, Nakadai K, Okuno H G. Two-layered audio-visual speech recognition for robots in noisy environments[C]//2010 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway, NJ, USA:IEEE Press, 2010:988-993.
[18] Liu H, Fan T, Wu P. Audio-visual keyword spotting based on adaptive decision fusion under noisy conditions for human-robot interaction[C]//2014 IEEE International Conference on Robotics and Automation (ICRA). Piscataway, NJ, USA:IEEE Press, 2014:6644-6651.
[19] Webb J, Ashley J. Beginning Kinect Programming with the Microsoft Kinect SDK[M]. Berkeley:Apress, 2012.
[20] Hong X, Yao H, Wan Y, et al. A PCA based visual DCT feature extraction method for lip-reading[C]//2006 International Conference on Intelligent Information Hiding and Multimedia. Los Alamitors, CA, USA:IEEE Computer Society, 2006:321-326.
[1] ZHANG Jiwen, SONG Libin, XU Junjie, SHI Xunlei, LIU Li. Unpredefined ball detection algorithm for humanoid soccer robots[J]. Journal of Tsinghua University(Science and Technology), 2019, 59(4): 298-305.
[2] ZHANG Jiwen, LIU Li, CHEN Ken. System design and local optimization of a small humanoid soccer robot MOS-7[J]. Journal of Tsinghua University(Science and Technology), 2016, 56(8): 811-817.
[3] ZHANG Jiwen, LIU Li, CHEN Ken. Stabilizing control of humanoids' walking based on AHRS feedback[J]. Journal of Tsinghua University(Science and Technology), 2016, 56(8): 818-823.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
Copyright © Journal of Tsinghua University(Science and Technology), All Rights Reserved.
Powered by Beijing Magtech Co. Ltd