Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们 横山亮次奖 百年刊庆
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  横山亮次奖  |  百年刊庆
清华大学学报(自然科学版)  2022, Vol. 62 Issue (11): 1751-1757    DOI: 10.16511/j.cnki.qhdxxb.2022.26.034
  论文 本期目录 | 过刊浏览 | 高级检索 |
基于优势竞争网络的转运机器人路径规划
何启嘉1,2,3, 王启明1,3, 李佳璇2,4,5, 王正佳2,4,5, 王通6
1. 中国科学院 国家天文台, 北京 100101;
2. 中国科学院大学, 北京 100049;
3. 中国科学院 FAST重点实验室, 北京 100101;
4. 中国科学院 沈阳自动化研究所, 沈阳 110016;
5. 中国科学院 机器人与智能制造创新研究院, 沈阳 110169;
6. 南京工业大学 计算机科学与技术学院, 南京 211816
Transport robot path planning based on an advantage dueling double deep Q-network
HE Qijia1,2,3, WANG Qiming1,3, LI Jiaxuan2,4,5, WANG Zhengjia2,4,5, WANG Tong6
1. National Astronomical Observatory, Chinese Academy of Sciences, Beijing 100101, China;
2. University of Chinese Academy of Sciences, Beijing 100049, China;
3. Key Laboratory of FAST, Chinese Academy of Sciences, Beijing 100101, China;
4. Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China;
5. Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang 110169, China;
6. School of Computer Science and Technology, Nanjing Tech University, Nanjing 211816, China
全文: PDF(4278 KB)   HTML
输出: BibTeX | EndNote (RIS)      
摘要 该文提出了一种基于深度强化学习的优势竞争网络(advantage dueling double deep Q-network,AD3QN)算法作为500 m口径球面射电望远镜(five-hundred-meter aperture spherical radio telescope,FAST)促动器自动化维护车间转运机器人的路径规划方法。通过预先学习竞争网络中的状态价值层,使状态价值参数根据环境状态进行初始化,减少了首次接触目标点所需要的步数;通过改进竞争网络中的贪婪搜索算法,使环境探索与利用的转变更为合理;通过改进动作选择策略,使机器人路径规划不易陷入局部极小值,进一步加快了算法收敛的速度。AD3QN算法具有动态规划能力强、实时性好、柔性高、鲁棒性强和准确率高等优点。对促动器自动化维护车间进行建模并测试网络改进前后的路径规划能力,仿真结果表明:采用AD3QN算法在首次找到目标点用时方面比一般竞争网络快176%。该研究有望提高FAST促动器的维护效率,进而减少对FAST观测时间的挤占。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
何启嘉
王启明
李佳璇
王正佳
王通
关键词 FAST促动器深度强化学习竞争网络路径规划    
Abstract:An advantage dueling double deep Q-network (AD3QN) algorithm using deep reinforcement learning was developed for the transport robot path planning for the five-hundred-meter aperture spherical radio telescope (FAST) actuator automatic maintenance workshop. The dueling network state value layer is learned in advance so that the state value parameters are initialized according to the environmental state to reduce the steps required to reach the target point the first time. An improved greedy network search algorithm simplifies the environmental exploration and utilization. The action selection strategy avoids local minima in the robot path and improves the algorithm convergence speed. AD3QN provides good dynamic planning and real-time performance and is flexible, robust and accurate. Modeling the actuator actuator maintenance workshop and testing the path planning capability of the network before and after the improvement, simulations show that the time to find the target point the first time is 176% faster with AD3QN than with a general dueling network. This research improves the actuator maintenance efficiency which provides extended observation times.
Key wordsFAST actuator    deep reinforcement learning    dueling network    path planning
收稿日期: 2021-12-02      出版日期: 2022-10-19
基金资助:国家重点研发计划项目(2019YFB1312702)
通讯作者: 王启明,研究员,E-mail:qmwang@bao.ac.cn      E-mail: qmwang@bao.ac.cn
作者简介: 何启嘉(1996-),男,博士研究生。
引用本文:   
何启嘉, 王启明, 李佳璇, 王正佳, 王通. 基于优势竞争网络的转运机器人路径规划[J]. 清华大学学报(自然科学版), 2022, 62(11): 1751-1757.
HE Qijia, WANG Qiming, LI Jiaxuan, WANG Zhengjia, WANG Tong. Transport robot path planning based on an advantage dueling double deep Q-network. Journal of Tsinghua University(Science and Technology), 2022, 62(11): 1751-1757.
链接本文:  
http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2022.26.034  或          http://jst.tsinghuajournals.com/CN/Y2022/V62/I11/1751
  
  
  
  
  
  
  
  
  
  
[1] NAN R D. Five-hundred-meter aperture spherical radio telescope (FAST)[J]. Science in China Series G:Physics, Mechanics & Astronomy, 2006, 49(2):129-148.
[2] 何启嘉, 王启明, 雷政. 基于改进遗传算法的FAST促动器油缸装配机器人运动学[J]. 科学技术与工程, 2021, 21(19):8072-8078. HE Q J, WANG Q M, LEI Z. Kinematics for oil cylinder assembly robot of FAST actuators based on improved genetic algorithm[J]. Science Technology and Engineering, 2021, 21(19):8072-8078. (in Chinese)
[3] 孙长银, 穆朝絮. 多智能体深度强化学习的若干关键科学问题[J]. 自动化学报, 2020, 46(7):1301-1312. SUN C Y, MU C X. Important scientific problems of multi-agent deep reinforcement learning[J]. Acta Automatica Sinica, 2020, 46(7):1301-1312. (in Chinese)
[4] 王慧, 秦广义, 杨春梅. 定制家具板材搬运AGV路径规划[J]. 包装工程, 2021, 42(17):203-209. WANG H, QIN G Y, YANG C M. AGV path planning of customized furniture plate transportation[J]. Packaging Engineering, 2021, 42(17):203-209. (in Chinese)
[5] 毛国君, 顾世民. 改进的Q-Learning算法及其在路径规划中的应用[J]. 太原理工大学学报, 2021, 52(1):91-97. MAO G J, GU S M. An improved Q-learning algorithm and its application in path planning[J]. Journal of Taiyuan University of Technology, 2021, 52(1):91-97. (in Chinese)
[6] 董瑶, 葛莹莹, 郭鸿湧, 等. 基于深度强化学习的移动机器人路径规划[J]. 计算机工程与应用, 2019, 55(13):15-19, 157. DONG Y, GE Y Y, GUO H Y, et al. Path planning for mobile robot based on deep reinforcement learning[J]. Computer Engineering and Applications, 2019, 55(13):15-19, 157. (in Chinese)
[7] 曾纪钧, 梁哲恒. 监督式强化学习在路径规划中的应用研究[J]. 计算机应用与软件, 2018, 35(10):185-188, 244. ZENG J J, LIANG Z H. Research of path planning based on the supervised reinforcement learning[J]. Computer Applications and Software, 2018, 35(10):185-188, 244. (in Chinese)
[8] 张福海, 李宁, 袁儒鹏, 等. 基于强化学习的机器人路径规划算法[J]. 华中科技大学学报(自然科学版), 2018, 46(12):65-70. ZHANG F H, LI L, YUAN R P, et al. Robot path planning algorithm based on reinforcement learning[J]. Journal of Huazhong University of Science and Technology (Natural Science Edition), 2018, 46(12):65-70. (in Chinese)
[9] WANG Z Y, SCHAUL T, HESSEL M, et al. Dueling network architectures for deep reinforcement learning[C]//Proceedings of the 33rd International Conference on International Conference on Machine Learning. New York, USA:JMLR, 2016:1995-2003.
[10] 徐晓苏, 袁杰. 基于改进强化学习的移动机器人路径规划方法[J]. 中国惯性技术学报, 2019, 27(3):314-320. XU X S, YUAN J. Path planning for mobile robot based on improved reinforcement learning algorithm[J]. Journal of Chinese Inertial Technology, 2019, 27(3):314-320. (in Chinese)
[11] 阮晓钢, 刘鹏飞, 朱晓庆. 基于气味奖励引导的Q-learning环境认知方法[J]. 清华大学学报(自然科学版), 2021, 61(3):254-260. RUAN X G, LIU P F, ZHU X Q. Q-learning environment recognition method based on odor-reward shaping[J]. Journal of Tsinghua University (Science and Technology), 2021, 61(3):254-260. (in Chinese)
[12] MNIH V, AVUKCUOGLU K, SILVER D, et al. Playing Atari with deep reinforcement learning[C]//Proceedings of the Workshops at the 26th Neural Information Processing Systems. Lake Tahoe, USA:MIT Press, 2013:201-220.
[13] VAN HASSELT H, GUEZ A, SILVER D. Deep reinforcement learning with double Q-Learning[C]//Proceedings of the 30th AAAI Conference on Artificial Intelligence. Phoenix, USA:AAAI, 2016:2094-2100.
[1] 毕军, 杜宇佳, 王永兴, 左小龙. 基于用户综合满意度的电动汽车充电诱导优化模型[J]. 清华大学学报(自然科学版), 2023, 63(11): 1750-1759.
[2] 雷政, 姜鹏, 王启明. FAST促动器故障预测与健康管理系统[J]. 清华大学学报(自然科学版), 2022, 62(11): 1796-1802.
[3] 崔俊云, 陈迪, 袁野, 马玉亮, 王国仁. 空间众包中在线路径规划算法[J]. 清华大学学报(自然科学版), 2020, 60(8): 672-682.
[4] 张书玮, 冯桂璇, 樊月珍, 万爽, 罗禹贡. 基于信息交互的大规模电动汽车充电路径规划[J]. 清华大学学报(自然科学版), 2018, 58(3): 279-285.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 《清华大学学报(自然科学版)》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn