Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们 横山亮次奖 百年刊庆
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  横山亮次奖  |  百年刊庆
清华大学学报(自然科学版)  2021, Vol. 61 Issue (3): 254-260    DOI: 10.16511/j.cnki.qhdxxb.2020.25.036
  电子工程 本期目录 | 过刊浏览 | 高级检索 |
基于气味奖励引导的Q-learning环境认知方法
阮晓钢1,2, 刘鹏飞1,2, 朱晓庆1,2
1. 北京工业大学 信息学部, 北京 100124;
2. 计算智能与智能系统北京市重点实验室, 北京 100124
Q-learning environment recognition method based on odor-reward shaping
RUAN Xiaogang1,2, LIU Pengfei1,2, ZHU Xiaoqing1,2
1. Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China;
2. Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing 100124, China
全文: PDF(6290 KB)   HTML
输出: BibTeX | EndNote (RIS)      
摘要 Q-learning作为一种无模型的值迭代强化学习算法,被广泛应用于移动机器人在非结构环境下的导航任务中。针对Q-learning在移动机器人导航中环境探索和利用存在矛盾关系导致收敛速度慢的问题,该文在Q-learning算法的基础上,受啮齿类动物可以利用嗅觉线索来进行空间定向和导航的启发,提出一种基于气味奖励引导的Q-learning环境认知策略。该算法通过改善Q-learning中的动作选择策略来减少对环境的无用探索,在动作选择策略中融入了环境气味奖励的引导,并提出了嗅觉因子来平衡动作选择策略中Q-learning和气味奖励引导的权重关系。为了验证算法的有效性,在Tolman老鼠实验所用的迷宫环境中进行了仿真实验,动态仿真结果表明,相比Q-learning算法,基于气味奖励引导的Q-learning算法在环境认知过程中,可减少对环境的无用探索,并增强对环境的认知学习能力,且提高算法的收敛速度。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
阮晓钢
刘鹏飞
朱晓庆
关键词 机器人导航环境认知Q-learning嗅觉因子    
AbstractQ-learning is a model-free iterative reinforcement learning algorithm that is widely used for navigating mobile robots in unstructured environments. However, the exploration and utilization of the environmental data limits the Q-learning convergence speed for mobile robot navigation. This study used the Q-learning algorithm and the fact that rodents use olfactory cues for spatial orientation and navigation to develop a Q-learning environmental cognitive strategy based on odor-reward shaping. This algorithm reduces useless exploration of the environment by improving the Q-learning action selection strategy. Environmental odor information is integrated into the algorithm with the olfactory factor used to weight the Q-learning and the odor-reward shaping in the action selection strategy. The algorithm effectiveness is evaluated in a simulation of movement in the labyrinth environment used in the Tolman mouse experiment. The results show that the Q-learning algorithm with odor-reward shaping reduces useless exploration of the environment, enhances cognitive learning of the environment, and improves the algorithm convergence speed.
Key wordsrobot navigation    environment recognition    Q-learning    olfactory factor
收稿日期: 2020-04-27      出版日期: 2021-03-06
基金资助:朱晓庆,讲师,E-mail:alex.zhuxq@bjut.edu.cn
引用本文:   
阮晓钢, 刘鹏飞, 朱晓庆. 基于气味奖励引导的Q-learning环境认知方法[J]. 清华大学学报(自然科学版), 2021, 61(3): 254-260.
RUAN Xiaogang, LIU Pengfei, ZHU Xiaoqing. Q-learning environment recognition method based on odor-reward shaping. Journal of Tsinghua University(Science and Technology), 2021, 61(3): 254-260.
链接本文:  
http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2020.25.036  或          http://jst.tsinghuajournals.com/CN/Y2021/V61/I3/254
  
  
  
  
  
  
  
  
[1] 王志文, 郭戈. 移动机器人导航技术现状与展望[J]. 机器人, 2003, 25(5):470-474. WANG Z W, GUO G. Present situation and future development of mobile robot navigation technology[J]. Robot, 2003, 25(5):470-474. (in Chinese)
[2] SUTTON R S, BARTO A G. Reinforcement learning:An introduction[M]. Cambridge, MA:MIT Press, 2018.
[3] WATKINS C J C H, DAYAN P. Q-learning[J]. Machine Learning, 1992, 8(3-4):279-292.
[4] YUAN R P, ZHANG F H, WANG Y, et al. A Q-learning approach based on human reasoning for navigation in a dynamic environment[J]. Robotica, 2019, 37(3):445-468.
[5] KHRIJI L, TOUATI F, BENHMED K, et al. Mobile robot navigation based on Q-learning technique[J]. International Journal of Advanced Robotic Systems, 2011, 8(1):4.
[6] SONG Y, LI Y B, LI C H, et al. An efficient initialization approach of Q-learning for mobile robots[J]. International Journal of Control, Automation and Systems, 2012, 10(1):166-172.
[7] LOW E S, ONG P, CHEAH K C. Solving the optimal path planning of a mobile robot using improved Q-learning[J]. Robotics and Autonomous Systems, 2019, 115:143-161.
[8] PANG T, RUAN X G, WANG E S, et al. Based on A* and Q-learning search and rescue robot navigation[J]. Telkomnika Indonesian Journal of Electrical Engineering, 2012, 10(7):1889-1896.
[9] LI S D, XU X, ZUO L. Dynamic path planning of a mobile robot with improved Q-learning algorithm[C]//2015 IEEE International Conference on Information and Automation. Lijiang, China:IEEE, 2015:409-414.
[10] NI J J, LI X Y, HUA M G, et al. Bio inspired neural network based Q-learning approach for robot path planning in unknown environments[J]. Int J Robot Autom, 2016, 31(6):4526-4590.
[11] ITO M, MIYAKE S, SAWADA Y. A neural network model of hippocampus-based ganglia for rat navigation tasks[J]. Electronics and Communications in Japan (Part Ⅲ:Fundamental Electronic Science), 2004, 87(10):66-80.
[12] KULVICIUS T, TAMOSIUNAITE M, AINGE J, et al. Odor supported place cell model and goal navigation in rodents[J]. Journal of Computational Neuroscience, 2008, 25(3):481-500.
[13] KHAN A G, SARANGI M, BHALLA U S. Rats track odour trails accurately using a multi-layered strategy with near-optimal sampling[J]. Nature Communications, 2012, 3:703.
[14] WALLACE D G, KOLB B, WHISHAW I Q. Odor tracking in rats with orbital frontal lesions[J]. Behavioral Neuroscience, 2003, 117(3):616-620.
[15] LIU A N, PAPALE A E, HENGENIUS J, et al. Mouse navigation strategies for odor source localization[J]. Frontiers in Neuroscience, 2020, 14:218.
[16] LI C Y, DONG H B, ZHAO K. Dual functions of insect wings in an odor-guided aeronautic navigation[J]. Journal of Fluids Engineering, 2020, 142(3):030902.
[17] TOLMAN E C. Cognitive maps in rats and men[J]. Psychological Review, 1948, 55(4):189.
[1] 杨敏, 李宏伟, 任怡凤, 张聪伟. 基于旅客异质性画像的公铁联程出行方案推荐方法[J]. 清华大学学报(自然科学版), 2022, 62(7): 1220-1227.
[2] 彭秋辰, 宋亦旭. 基于Mask R-CNN的物体识别和定位[J]. 清华大学学报(自然科学版), 2019, 59(2): 135-141.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 《清华大学学报(自然科学版)》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn