清华大学学报(自然科学版)  2021, Vol. 61 Issue (3): 254-260    DOI: 10.16511/j.cnki.qhdxxb.2020.25.036
阮晓钢1,2, 刘鹏飞1,2, 朱晓庆1,2
1. 北京工业大学 信息学部, 北京 100124;
2. 计算智能与智能系统北京市重点实验室, 北京 100124
Q-learning environment recognition method based on odor-reward shaping
RUAN Xiaogang1,2, LIU Pengfei1,2, ZHU Xiaoqing1,2
1. Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China;
2. Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing 100124, China
摘要 Q-learning作为一种无模型的值迭代强化学习算法,被广泛应用于移动机器人在非结构环境下的导航任务中。针对Q-learning在移动机器人导航中环境探索和利用存在矛盾关系导致收敛速度慢的问题,该文在Q-learning算法的基础上,受啮齿类动物可以利用嗅觉线索来进行空间定向和导航的启发,提出一种基于气味奖励引导的Q-learning环境认知策略。该算法通过改善Q-learning中的动作选择策略来减少对环境的无用探索,在动作选择策略中融入了环境气味奖励的引导,并提出了嗅觉因子来平衡动作选择策略中Q-learning和气味奖励引导的权重关系。为了验证算法的有效性,在Tolman老鼠实验所用的迷宫环境中进行了仿真实验,动态仿真结果表明,相比Q-learning算法,基于气味奖励引导的Q-learning算法在环境认知过程中,可减少对环境的无用探索,并增强对环境的认知学习能力,且提高算法的收敛速度。
关键词 机器人导航环境认知Q-learning嗅觉因子    
AbstractQ-learning is a model-free iterative reinforcement learning algorithm that is widely used for navigating mobile robots in unstructured environments. However, the exploration and utilization of the environmental data limits the Q-learning convergence speed for mobile robot navigation. This study used the Q-learning algorithm and the fact that rodents use olfactory cues for spatial orientation and navigation to develop a Q-learning environmental cognitive strategy based on odor-reward shaping. This algorithm reduces useless exploration of the environment by improving the Q-learning action selection strategy. Environmental odor information is integrated into the algorithm with the olfactory factor used to weight the Q-learning and the odor-reward shaping in the action selection strategy. The algorithm effectiveness is evaluated in a simulation of movement in the labyrinth environment used in the Tolman mouse experiment. The results show that the Q-learning algorithm with odor-reward shaping reduces useless exploration of the environment, enhances cognitive learning of the environment, and improves the algorithm convergence speed.
Key wordsrobot navigation    environment recognition    Q-learning    olfactory factor
收稿日期: 2020-04-27      出版日期: 2021-03-06
阮晓钢, 刘鹏飞, 朱晓庆. 基于气味奖励引导的Q-learning环境认知方法[J]. 清华大学学报(自然科学版), 2021, 61(3): 254-260.
RUAN Xiaogang, LIU Pengfei, ZHU Xiaoqing. Q-learning environment recognition method based on odor-reward shaping. Journal of Tsinghua University(Science and Technology), 2021, 61(3): 254-260.
