Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  百年期刊
Journal of Tsinghua University(Science and Technology)    2021, Vol. 61 Issue (3) : 254-260     DOI: 10.16511/j.cnki.qhdxxb.2020.25.036
ELECTRONIC ENGINEERING |
Q-learning environment recognition method based on odor-reward shaping
RUAN Xiaogang1,2, LIU Pengfei1,2, ZHU Xiaoqing1,2
1. Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China;
2. Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing 100124, China
Download: PDF(6290 KB)   HTML
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks    
Abstract  Q-learning is a model-free iterative reinforcement learning algorithm that is widely used for navigating mobile robots in unstructured environments. However, the exploration and utilization of the environmental data limits the Q-learning convergence speed for mobile robot navigation. This study used the Q-learning algorithm and the fact that rodents use olfactory cues for spatial orientation and navigation to develop a Q-learning environmental cognitive strategy based on odor-reward shaping. This algorithm reduces useless exploration of the environment by improving the Q-learning action selection strategy. Environmental odor information is integrated into the algorithm with the olfactory factor used to weight the Q-learning and the odor-reward shaping in the action selection strategy. The algorithm effectiveness is evaluated in a simulation of movement in the labyrinth environment used in the Tolman mouse experiment. The results show that the Q-learning algorithm with odor-reward shaping reduces useless exploration of the environment, enhances cognitive learning of the environment, and improves the algorithm convergence speed.
Keywords robot navigation      environment recognition      Q-learning      olfactory factor     
Issue Date: 06 March 2021
Service
E-mail this article
E-mail Alert
RSS
Articles by authors
RUAN Xiaogang
LIU Pengfei
ZHU Xiaoqing
Cite this article:   
RUAN Xiaogang,LIU Pengfei,ZHU Xiaoqing. Q-learning environment recognition method based on odor-reward shaping[J]. Journal of Tsinghua University(Science and Technology), 2021, 61(3): 254-260.
URL:  
http://jst.tsinghuajournals.com/EN/10.16511/j.cnki.qhdxxb.2020.25.036     OR     http://jst.tsinghuajournals.com/EN/Y2021/V61/I3/254
  
  
  
  
  
  
  
  
[1] 王志文, 郭戈. 移动机器人导航技术现状与展望[J]. 机器人, 2003, 25(5):470-474. WANG Z W, GUO G. Present situation and future development of mobile robot navigation technology[J]. Robot, 2003, 25(5):470-474. (in Chinese)
[2] SUTTON R S, BARTO A G. Reinforcement learning:An introduction[M]. Cambridge, MA:MIT Press, 2018.
[3] WATKINS C J C H, DAYAN P. Q-learning[J]. Machine Learning, 1992, 8(3-4):279-292.
[4] YUAN R P, ZHANG F H, WANG Y, et al. A Q-learning approach based on human reasoning for navigation in a dynamic environment[J]. Robotica, 2019, 37(3):445-468.
[5] KHRIJI L, TOUATI F, BENHMED K, et al. Mobile robot navigation based on Q-learning technique[J]. International Journal of Advanced Robotic Systems, 2011, 8(1):4.
[6] SONG Y, LI Y B, LI C H, et al. An efficient initialization approach of Q-learning for mobile robots[J]. International Journal of Control, Automation and Systems, 2012, 10(1):166-172.
[7] LOW E S, ONG P, CHEAH K C. Solving the optimal path planning of a mobile robot using improved Q-learning[J]. Robotics and Autonomous Systems, 2019, 115:143-161.
[8] PANG T, RUAN X G, WANG E S, et al. Based on A* and Q-learning search and rescue robot navigation[J]. Telkomnika Indonesian Journal of Electrical Engineering, 2012, 10(7):1889-1896.
[9] LI S D, XU X, ZUO L. Dynamic path planning of a mobile robot with improved Q-learning algorithm[C]//2015 IEEE International Conference on Information and Automation. Lijiang, China:IEEE, 2015:409-414.
[10] NI J J, LI X Y, HUA M G, et al. Bio inspired neural network based Q-learning approach for robot path planning in unknown environments[J]. Int J Robot Autom, 2016, 31(6):4526-4590.
[11] ITO M, MIYAKE S, SAWADA Y. A neural network model of hippocampus-based ganglia for rat navigation tasks[J]. Electronics and Communications in Japan (Part Ⅲ:Fundamental Electronic Science), 2004, 87(10):66-80.
[12] KULVICIUS T, TAMOSIUNAITE M, AINGE J, et al. Odor supported place cell model and goal navigation in rodents[J]. Journal of Computational Neuroscience, 2008, 25(3):481-500.
[13] KHAN A G, SARANGI M, BHALLA U S. Rats track odour trails accurately using a multi-layered strategy with near-optimal sampling[J]. Nature Communications, 2012, 3:703.
[14] WALLACE D G, KOLB B, WHISHAW I Q. Odor tracking in rats with orbital frontal lesions[J]. Behavioral Neuroscience, 2003, 117(3):616-620.
[15] LIU A N, PAPALE A E, HENGENIUS J, et al. Mouse navigation strategies for odor source localization[J]. Frontiers in Neuroscience, 2020, 14:218.
[16] LI C Y, DONG H B, ZHAO K. Dual functions of insect wings in an odor-guided aeronautic navigation[J]. Journal of Fluids Engineering, 2020, 142(3):030902.
[17] TOLMAN E C. Cognitive maps in rats and men[J]. Psychological Review, 1948, 55(4):189.
[1] YANG Min, LI Hongwei, REN Yifeng, ZHANG Congwei. Road-rail intermodal travel recommendations based on a passenger heterogeneity profile[J]. Journal of Tsinghua University(Science and Technology), 2022, 62(7): 1220-1227.
[2] PENG Qiuchen, SONG Yixu. Object recognition and localization based on Mask R-CNN[J]. Journal of Tsinghua University(Science and Technology), 2019, 59(2): 135-141.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
Copyright © Journal of Tsinghua University(Science and Technology), All Rights Reserved.
Powered by Beijing Magtech Co. Ltd