Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  百年期刊
Journal of Tsinghua University(Science and Technology)    2022, Vol. 62 Issue (11) : 1751-1757     DOI: 10.16511/j.cnki.qhdxxb.2022.26.034
Research Article |
Transport robot path planning based on an advantage dueling double deep Q-network
HE Qijia1,2,3, WANG Qiming1,3, LI Jiaxuan2,4,5, WANG Zhengjia2,4,5, WANG Tong6
1. National Astronomical Observatory, Chinese Academy of Sciences, Beijing 100101, China;
2. University of Chinese Academy of Sciences, Beijing 100049, China;
3. Key Laboratory of FAST, Chinese Academy of Sciences, Beijing 100101, China;
4. Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China;
5. Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang 110169, China;
6. School of Computer Science and Technology, Nanjing Tech University, Nanjing 211816, China
Download: PDF(4278 KB)   HTML
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks    
Abstract  An advantage dueling double deep Q-network (AD3QN) algorithm using deep reinforcement learning was developed for the transport robot path planning for the five-hundred-meter aperture spherical radio telescope (FAST) actuator automatic maintenance workshop. The dueling network state value layer is learned in advance so that the state value parameters are initialized according to the environmental state to reduce the steps required to reach the target point the first time. An improved greedy network search algorithm simplifies the environmental exploration and utilization. The action selection strategy avoids local minima in the robot path and improves the algorithm convergence speed. AD3QN provides good dynamic planning and real-time performance and is flexible, robust and accurate. Modeling the actuator actuator maintenance workshop and testing the path planning capability of the network before and after the improvement, simulations show that the time to find the target point the first time is 176% faster with AD3QN than with a general dueling network. This research improves the actuator maintenance efficiency which provides extended observation times.
Keywords FAST actuator      deep reinforcement learning      dueling network      path planning     
Issue Date: 19 October 2022
Service
E-mail this article
E-mail Alert
RSS
Articles by authors
HE Qijia
WANG Qiming
LI Jiaxuan
WANG Zhengjia
WANG Tong
Cite this article:   
HE Qijia,WANG Qiming,LI Jiaxuan, et al. Transport robot path planning based on an advantage dueling double deep Q-network[J]. Journal of Tsinghua University(Science and Technology), 2022, 62(11): 1751-1757.
URL:  
http://jst.tsinghuajournals.com/EN/10.16511/j.cnki.qhdxxb.2022.26.034     OR     http://jst.tsinghuajournals.com/EN/Y2022/V62/I11/1751
  
  
  
  
  
  
  
  
  
  
[1] NAN R D. Five-hundred-meter aperture spherical radio telescope (FAST)[J]. Science in China Series G:Physics, Mechanics & Astronomy, 2006, 49(2):129-148.
[2] 何启嘉, 王启明, 雷政. 基于改进遗传算法的FAST促动器油缸装配机器人运动学[J]. 科学技术与工程, 2021, 21(19):8072-8078. HE Q J, WANG Q M, LEI Z. Kinematics for oil cylinder assembly robot of FAST actuators based on improved genetic algorithm[J]. Science Technology and Engineering, 2021, 21(19):8072-8078. (in Chinese)
[3] 孙长银, 穆朝絮. 多智能体深度强化学习的若干关键科学问题[J]. 自动化学报, 2020, 46(7):1301-1312. SUN C Y, MU C X. Important scientific problems of multi-agent deep reinforcement learning[J]. Acta Automatica Sinica, 2020, 46(7):1301-1312. (in Chinese)
[4] 王慧, 秦广义, 杨春梅. 定制家具板材搬运AGV路径规划[J]. 包装工程, 2021, 42(17):203-209. WANG H, QIN G Y, YANG C M. AGV path planning of customized furniture plate transportation[J]. Packaging Engineering, 2021, 42(17):203-209. (in Chinese)
[5] 毛国君, 顾世民. 改进的Q-Learning算法及其在路径规划中的应用[J]. 太原理工大学学报, 2021, 52(1):91-97. MAO G J, GU S M. An improved Q-learning algorithm and its application in path planning[J]. Journal of Taiyuan University of Technology, 2021, 52(1):91-97. (in Chinese)
[6] 董瑶, 葛莹莹, 郭鸿湧, 等. 基于深度强化学习的移动机器人路径规划[J]. 计算机工程与应用, 2019, 55(13):15-19, 157. DONG Y, GE Y Y, GUO H Y, et al. Path planning for mobile robot based on deep reinforcement learning[J]. Computer Engineering and Applications, 2019, 55(13):15-19, 157. (in Chinese)
[7] 曾纪钧, 梁哲恒. 监督式强化学习在路径规划中的应用研究[J]. 计算机应用与软件, 2018, 35(10):185-188, 244. ZENG J J, LIANG Z H. Research of path planning based on the supervised reinforcement learning[J]. Computer Applications and Software, 2018, 35(10):185-188, 244. (in Chinese)
[8] 张福海, 李宁, 袁儒鹏, 等. 基于强化学习的机器人路径规划算法[J]. 华中科技大学学报(自然科学版), 2018, 46(12):65-70. ZHANG F H, LI L, YUAN R P, et al. Robot path planning algorithm based on reinforcement learning[J]. Journal of Huazhong University of Science and Technology (Natural Science Edition), 2018, 46(12):65-70. (in Chinese)
[9] WANG Z Y, SCHAUL T, HESSEL M, et al. Dueling network architectures for deep reinforcement learning[C]//Proceedings of the 33rd International Conference on International Conference on Machine Learning. New York, USA:JMLR, 2016:1995-2003.
[10] 徐晓苏, 袁杰. 基于改进强化学习的移动机器人路径规划方法[J]. 中国惯性技术学报, 2019, 27(3):314-320. XU X S, YUAN J. Path planning for mobile robot based on improved reinforcement learning algorithm[J]. Journal of Chinese Inertial Technology, 2019, 27(3):314-320. (in Chinese)
[11] 阮晓钢, 刘鹏飞, 朱晓庆. 基于气味奖励引导的Q-learning环境认知方法[J]. 清华大学学报(自然科学版), 2021, 61(3):254-260. RUAN X G, LIU P F, ZHU X Q. Q-learning environment recognition method based on odor-reward shaping[J]. Journal of Tsinghua University (Science and Technology), 2021, 61(3):254-260. (in Chinese)
[12] MNIH V, AVUKCUOGLU K, SILVER D, et al. Playing Atari with deep reinforcement learning[C]//Proceedings of the Workshops at the 26th Neural Information Processing Systems. Lake Tahoe, USA:MIT Press, 2013:201-220.
[13] VAN HASSELT H, GUEZ A, SILVER D. Deep reinforcement learning with double Q-Learning[C]//Proceedings of the 30th AAAI Conference on Artificial Intelligence. Phoenix, USA:AAAI, 2016:2094-2100.
[1] BI Jun, DU Yujia, WANG Yongxing, ZUO Xiaolong. Optimization model of electric vehicle charging induction based on comprehensive satisfaction of users[J]. Journal of Tsinghua University(Science and Technology), 2023, 63(11): 1750-1759.
[2] ZHANG Shuwei, FENG Guixuan, FAN Yuezhen, WAN Shuang, LUO Yugong. Large-scale electric vehicle charging path planning based on information interaction[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(3): 279-285.
[3] ZENG Jinle, DU Dong, CHANG Baohua, HONG Yuxiang, CHANG Shuhe, PAN Jiluan. Motion planning method for complex three dimensional path welding[J]. Journal of Tsinghua University(Science and Technology), 2016, 56(10): 1031-1036.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
Copyright © Journal of Tsinghua University(Science and Technology), All Rights Reserved.
Powered by Beijing Magtech Co. Ltd