Q-learning environment recognition method based on odor-reward shaping
RUAN Xiaogang1,2, LIU Pengfei1,2, ZHU Xiaoqing1,2
1. Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; 2. Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing 100124, China
Abstract:Q-learning is a model-free iterative reinforcement learning algorithm that is widely used for navigating mobile robots in unstructured environments. However, the exploration and utilization of the environmental data limits the Q-learning convergence speed for mobile robot navigation. This study used the Q-learning algorithm and the fact that rodents use olfactory cues for spatial orientation and navigation to develop a Q-learning environmental cognitive strategy based on odor-reward shaping. This algorithm reduces useless exploration of the environment by improving the Q-learning action selection strategy. Environmental odor information is integrated into the algorithm with the olfactory factor used to weight the Q-learning and the odor-reward shaping in the action selection strategy. The algorithm effectiveness is evaluated in a simulation of movement in the labyrinth environment used in the Tolman mouse experiment. The results show that the Q-learning algorithm with odor-reward shaping reduces useless exploration of the environment, enhances cognitive learning of the environment, and improves the algorithm convergence speed.
[1] 王志文, 郭戈. 移动机器人导航技术现状与展望[J]. 机器人, 2003, 25(5):470-474. WANG Z W, GUO G. Present situation and future development of mobile robot navigation technology[J]. Robot, 2003, 25(5):470-474. (in Chinese) [2] SUTTON R S, BARTO A G. Reinforcement learning:An introduction[M]. Cambridge, MA:MIT Press, 2018. [3] WATKINS C J C H, DAYAN P. Q-learning[J]. Machine Learning, 1992, 8(3-4):279-292. [4] YUAN R P, ZHANG F H, WANG Y, et al. A Q-learning approach based on human reasoning for navigation in a dynamic environment[J]. Robotica, 2019, 37(3):445-468. [5] KHRIJI L, TOUATI F, BENHMED K, et al. Mobile robot navigation based on Q-learning technique[J]. International Journal of Advanced Robotic Systems, 2011, 8(1):4. [6] SONG Y, LI Y B, LI C H, et al. An efficient initialization approach of Q-learning for mobile robots[J]. International Journal of Control, Automation and Systems, 2012, 10(1):166-172. [7] LOW E S, ONG P, CHEAH K C. Solving the optimal path planning of a mobile robot using improved Q-learning[J]. Robotics and Autonomous Systems, 2019, 115:143-161. [8] PANG T, RUAN X G, WANG E S, et al. Based on A* and Q-learning search and rescue robot navigation[J]. Telkomnika Indonesian Journal of Electrical Engineering, 2012, 10(7):1889-1896. [9] LI S D, XU X, ZUO L. Dynamic path planning of a mobile robot with improved Q-learning algorithm[C]//2015 IEEE International Conference on Information and Automation. Lijiang, China:IEEE, 2015:409-414. [10] NI J J, LI X Y, HUA M G, et al. Bio inspired neural network based Q-learning approach for robot path planning in unknown environments[J]. Int J Robot Autom, 2016, 31(6):4526-4590. [11] ITO M, MIYAKE S, SAWADA Y. A neural network model of hippocampus-based ganglia for rat navigation tasks[J]. Electronics and Communications in Japan (Part Ⅲ:Fundamental Electronic Science), 2004, 87(10):66-80. [12] KULVICIUS T, TAMOSIUNAITE M, AINGE J, et al. Odor supported place cell model and goal navigation in rodents[J]. Journal of Computational Neuroscience, 2008, 25(3):481-500. [13] KHAN A G, SARANGI M, BHALLA U S. Rats track odour trails accurately using a multi-layered strategy with near-optimal sampling[J]. Nature Communications, 2012, 3:703. [14] WALLACE D G, KOLB B, WHISHAW I Q. Odor tracking in rats with orbital frontal lesions[J]. Behavioral Neuroscience, 2003, 117(3):616-620. [15] LIU A N, PAPALE A E, HENGENIUS J, et al. Mouse navigation strategies for odor source localization[J]. Frontiers in Neuroscience, 2020, 14:218. [16] LI C Y, DONG H B, ZHAO K. Dual functions of insect wings in an odor-guided aeronautic navigation[J]. Journal of Fluids Engineering, 2020, 142(3):030902. [17] TOLMAN E C. Cognitive maps in rats and men[J]. Psychological Review, 1948, 55(4):189.