Research Article |
|
|
|
|
|
Transport robot path planning based on an advantage dueling double deep Q-network |
HE Qijia1,2,3, WANG Qiming1,3, LI Jiaxuan2,4,5, WANG Zhengjia2,4,5, WANG Tong6 |
1. National Astronomical Observatory, Chinese Academy of Sciences, Beijing 100101, China; 2. University of Chinese Academy of Sciences, Beijing 100049, China; 3. Key Laboratory of FAST, Chinese Academy of Sciences, Beijing 100101, China; 4. Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China; 5. Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang 110169, China; 6. School of Computer Science and Technology, Nanjing Tech University, Nanjing 211816, China |
|
|
Abstract An advantage dueling double deep Q-network (AD3QN) algorithm using deep reinforcement learning was developed for the transport robot path planning for the five-hundred-meter aperture spherical radio telescope (FAST) actuator automatic maintenance workshop. The dueling network state value layer is learned in advance so that the state value parameters are initialized according to the environmental state to reduce the steps required to reach the target point the first time. An improved greedy network search algorithm simplifies the environmental exploration and utilization. The action selection strategy avoids local minima in the robot path and improves the algorithm convergence speed. AD3QN provides good dynamic planning and real-time performance and is flexible, robust and accurate. Modeling the actuator actuator maintenance workshop and testing the path planning capability of the network before and after the improvement, simulations show that the time to find the target point the first time is 176% faster with AD3QN than with a general dueling network. This research improves the actuator maintenance efficiency which provides extended observation times.
|
Keywords
FAST actuator
deep reinforcement learning
dueling network
path planning
|
Issue Date: 19 October 2022
|
|
|
[1] NAN R D. Five-hundred-meter aperture spherical radio telescope (FAST)[J]. Science in China Series G:Physics, Mechanics & Astronomy, 2006, 49(2):129-148. [2] 何启嘉, 王启明, 雷政. 基于改进遗传算法的FAST促动器油缸装配机器人运动学[J]. 科学技术与工程, 2021, 21(19):8072-8078. HE Q J, WANG Q M, LEI Z. Kinematics for oil cylinder assembly robot of FAST actuators based on improved genetic algorithm[J]. Science Technology and Engineering, 2021, 21(19):8072-8078. (in Chinese) [3] 孙长银, 穆朝絮. 多智能体深度强化学习的若干关键科学问题[J]. 自动化学报, 2020, 46(7):1301-1312. SUN C Y, MU C X. Important scientific problems of multi-agent deep reinforcement learning[J]. Acta Automatica Sinica, 2020, 46(7):1301-1312. (in Chinese) [4] 王慧, 秦广义, 杨春梅. 定制家具板材搬运AGV路径规划[J]. 包装工程, 2021, 42(17):203-209. WANG H, QIN G Y, YANG C M. AGV path planning of customized furniture plate transportation[J]. Packaging Engineering, 2021, 42(17):203-209. (in Chinese) [5] 毛国君, 顾世民. 改进的Q-Learning算法及其在路径规划中的应用[J]. 太原理工大学学报, 2021, 52(1):91-97. MAO G J, GU S M. An improved Q-learning algorithm and its application in path planning[J]. Journal of Taiyuan University of Technology, 2021, 52(1):91-97. (in Chinese) [6] 董瑶, 葛莹莹, 郭鸿湧, 等. 基于深度强化学习的移动机器人路径规划[J]. 计算机工程与应用, 2019, 55(13):15-19, 157. DONG Y, GE Y Y, GUO H Y, et al. Path planning for mobile robot based on deep reinforcement learning[J]. Computer Engineering and Applications, 2019, 55(13):15-19, 157. (in Chinese) [7] 曾纪钧, 梁哲恒. 监督式强化学习在路径规划中的应用研究[J]. 计算机应用与软件, 2018, 35(10):185-188, 244. ZENG J J, LIANG Z H. Research of path planning based on the supervised reinforcement learning[J]. Computer Applications and Software, 2018, 35(10):185-188, 244. (in Chinese) [8] 张福海, 李宁, 袁儒鹏, 等. 基于强化学习的机器人路径规划算法[J]. 华中科技大学学报(自然科学版), 2018, 46(12):65-70. ZHANG F H, LI L, YUAN R P, et al. Robot path planning algorithm based on reinforcement learning[J]. Journal of Huazhong University of Science and Technology (Natural Science Edition), 2018, 46(12):65-70. (in Chinese) [9] WANG Z Y, SCHAUL T, HESSEL M, et al. Dueling network architectures for deep reinforcement learning[C]//Proceedings of the 33rd International Conference on International Conference on Machine Learning. New York, USA:JMLR, 2016:1995-2003. [10] 徐晓苏, 袁杰. 基于改进强化学习的移动机器人路径规划方法[J]. 中国惯性技术学报, 2019, 27(3):314-320. XU X S, YUAN J. Path planning for mobile robot based on improved reinforcement learning algorithm[J]. Journal of Chinese Inertial Technology, 2019, 27(3):314-320. (in Chinese) [11] 阮晓钢, 刘鹏飞, 朱晓庆. 基于气味奖励引导的Q-learning环境认知方法[J]. 清华大学学报(自然科学版), 2021, 61(3):254-260. RUAN X G, LIU P F, ZHU X Q. Q-learning environment recognition method based on odor-reward shaping[J]. Journal of Tsinghua University (Science and Technology), 2021, 61(3):254-260. (in Chinese) [12] MNIH V, AVUKCUOGLU K, SILVER D, et al. Playing Atari with deep reinforcement learning[C]//Proceedings of the Workshops at the 26th Neural Information Processing Systems. Lake Tahoe, USA:MIT Press, 2013:201-220. [13] VAN HASSELT H, GUEZ A, SILVER D. Deep reinforcement learning with double Q-Learning[C]//Proceedings of the 30th AAAI Conference on Artificial Intelligence. Phoenix, USA:AAAI, 2016:2094-2100. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|