基于气味奖励引导的Q-learning环境认知方法

阮晓钢, 刘鹏飞, 朱晓庆

清华大学学报(自然科学版) ›› 2021, Vol. 61 ›› Issue (3) : 254-260.

PDF(6290 KB)
PDF(6290 KB)
清华大学学报(自然科学版) ›› 2021, Vol. 61 ›› Issue (3) : 254-260. DOI: 10.16511/j.cnki.qhdxxb.2020.25.036
电子工程

基于气味奖励引导的Q-learning环境认知方法

  • 阮晓钢1,2, 刘鹏飞1,2, 朱晓庆1,2
作者信息 +

Q-learning environment recognition method based on odor-reward shaping

  • RUAN Xiaogang1,2, LIU Pengfei1,2, ZHU Xiaoqing1,2
Author information +
文章历史 +

摘要

Q-learning作为一种无模型的值迭代强化学习算法,被广泛应用于移动机器人在非结构环境下的导航任务中。针对Q-learning在移动机器人导航中环境探索和利用存在矛盾关系导致收敛速度慢的问题,该文在Q-learning算法的基础上,受啮齿类动物可以利用嗅觉线索来进行空间定向和导航的启发,提出一种基于气味奖励引导的Q-learning环境认知策略。该算法通过改善Q-learning中的动作选择策略来减少对环境的无用探索,在动作选择策略中融入了环境气味奖励的引导,并提出了嗅觉因子来平衡动作选择策略中Q-learning和气味奖励引导的权重关系。为了验证算法的有效性,在Tolman老鼠实验所用的迷宫环境中进行了仿真实验,动态仿真结果表明,相比Q-learning算法,基于气味奖励引导的Q-learning算法在环境认知过程中,可减少对环境的无用探索,并增强对环境的认知学习能力,且提高算法的收敛速度。

Abstract

Q-learning is a model-free iterative reinforcement learning algorithm that is widely used for navigating mobile robots in unstructured environments. However, the exploration and utilization of the environmental data limits the Q-learning convergence speed for mobile robot navigation. This study used the Q-learning algorithm and the fact that rodents use olfactory cues for spatial orientation and navigation to develop a Q-learning environmental cognitive strategy based on odor-reward shaping. This algorithm reduces useless exploration of the environment by improving the Q-learning action selection strategy. Environmental odor information is integrated into the algorithm with the olfactory factor used to weight the Q-learning and the odor-reward shaping in the action selection strategy. The algorithm effectiveness is evaluated in a simulation of movement in the labyrinth environment used in the Tolman mouse experiment. The results show that the Q-learning algorithm with odor-reward shaping reduces useless exploration of the environment, enhances cognitive learning of the environment, and improves the algorithm convergence speed.

关键词

机器人导航 / 环境认知 / Q-learning / 嗅觉因子

Key words

robot navigation / environment recognition / Q-learning / olfactory factor

引用本文

导出引用
阮晓钢, 刘鹏飞, 朱晓庆. 基于气味奖励引导的Q-learning环境认知方法[J]. 清华大学学报(自然科学版). 2021, 61(3): 254-260 https://doi.org/10.16511/j.cnki.qhdxxb.2020.25.036
RUAN Xiaogang, LIU Pengfei, ZHU Xiaoqing. Q-learning environment recognition method based on odor-reward shaping[J]. Journal of Tsinghua University(Science and Technology). 2021, 61(3): 254-260 https://doi.org/10.16511/j.cnki.qhdxxb.2020.25.036

参考文献

[1] 王志文, 郭戈. 移动机器人导航技术现状与展望[J]. 机器人, 2003, 25(5):470-474. WANG Z W, GUO G. Present situation and future development of mobile robot navigation technology[J]. Robot, 2003, 25(5):470-474. (in Chinese)
[2] SUTTON R S, BARTO A G. Reinforcement learning:An introduction[M]. Cambridge, MA:MIT Press, 2018.
[3] WATKINS C J C H, DAYAN P. Q-learning[J]. Machine Learning, 1992, 8(3-4):279-292.
[4] YUAN R P, ZHANG F H, WANG Y, et al. A Q-learning approach based on human reasoning for navigation in a dynamic environment[J]. Robotica, 2019, 37(3):445-468.
[5] KHRIJI L, TOUATI F, BENHMED K, et al. Mobile robot navigation based on Q-learning technique[J]. International Journal of Advanced Robotic Systems, 2011, 8(1):4.
[6] SONG Y, LI Y B, LI C H, et al. An efficient initialization approach of Q-learning for mobile robots[J]. International Journal of Control, Automation and Systems, 2012, 10(1):166-172.
[7] LOW E S, ONG P, CHEAH K C. Solving the optimal path planning of a mobile robot using improved Q-learning[J]. Robotics and Autonomous Systems, 2019, 115:143-161.
[8] PANG T, RUAN X G, WANG E S, et al. Based on A* and Q-learning search and rescue robot navigation[J]. Telkomnika Indonesian Journal of Electrical Engineering, 2012, 10(7):1889-1896.
[9] LI S D, XU X, ZUO L. Dynamic path planning of a mobile robot with improved Q-learning algorithm[C]//2015 IEEE International Conference on Information and Automation. Lijiang, China:IEEE, 2015:409-414.
[10] NI J J, LI X Y, HUA M G, et al. Bio inspired neural network based Q-learning approach for robot path planning in unknown environments[J]. Int J Robot Autom, 2016, 31(6):4526-4590.
[11] ITO M, MIYAKE S, SAWADA Y. A neural network model of hippocampus-based ganglia for rat navigation tasks[J]. Electronics and Communications in Japan (Part Ⅲ:Fundamental Electronic Science), 2004, 87(10):66-80.
[12] KULVICIUS T, TAMOSIUNAITE M, AINGE J, et al. Odor supported place cell model and goal navigation in rodents[J]. Journal of Computational Neuroscience, 2008, 25(3):481-500.
[13] KHAN A G, SARANGI M, BHALLA U S. Rats track odour trails accurately using a multi-layered strategy with near-optimal sampling[J]. Nature Communications, 2012, 3:703.
[14] WALLACE D G, KOLB B, WHISHAW I Q. Odor tracking in rats with orbital frontal lesions[J]. Behavioral Neuroscience, 2003, 117(3):616-620.
[15] LIU A N, PAPALE A E, HENGENIUS J, et al. Mouse navigation strategies for odor source localization[J]. Frontiers in Neuroscience, 2020, 14:218.
[16] LI C Y, DONG H B, ZHAO K. Dual functions of insect wings in an odor-guided aeronautic navigation[J]. Journal of Fluids Engineering, 2020, 142(3):030902.
[17] TOLMAN E C. Cognitive maps in rats and men[J]. Psychological Review, 1948, 55(4):189.

基金

朱晓庆,讲师,E-mail:alex.zhuxq@bjut.edu.cn

PDF(6290 KB)

Accesses

Citation

Detail

段落导航
相关文章

/