基于优势竞争网络的转运机器人路径规划

doi:10.16511/j.cnki.qhdxxb.2022.26.034

摘要
图/表
参考文献
相关文章
Metrics

全文: PDF(4278 KB) HTML
输出: BibTeX | EndNote (RIS)

摘要该文提出了一种基于深度强化学习的优势竞争网络（advantage dueling double deep Q-network，AD3QN）算法作为500 m口径球面射电望远镜（five-hundred-meter aperture spherical radio telescope，FAST）促动器自动化维护车间转运机器人的路径规划方法。通过预先学习竞争网络中的状态价值层，使状态价值参数根据环境状态进行初始化，减少了首次接触目标点所需要的步数；通过改进竞争网络中的贪婪搜索算法，使环境探索与利用的转变更为合理；通过改进动作选择策略，使机器人路径规划不易陷入局部极小值，进一步加快了算法收敛的速度。AD3QN算法具有动态规划能力强、实时性好、柔性高、鲁棒性强和准确率高等优点。对促动器自动化维护车间进行建模并测试网络改进前后的路径规划能力，仿真结果表明：采用AD3QN算法在首次找到目标点用时方面比一般竞争网络快176%。该研究有望提高FAST促动器的维护效率，进而减少对FAST观测时间的挤占。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS

	作者相关文章
	何启嘉
	王启明
	李佳璇
	王正佳
	王通

关键词 ： FAST促动器, 深度强化学习, 竞争网络, 路径规划

Abstract：An advantage dueling double deep Q-network (AD3QN) algorithm using deep reinforcement learning was developed for the transport robot path planning for the five-hundred-meter aperture spherical radio telescope (FAST) actuator automatic maintenance workshop. The dueling network state value layer is learned in advance so that the state value parameters are initialized according to the environmental state to reduce the steps required to reach the target point the first time. An improved greedy network search algorithm simplifies the environmental exploration and utilization. The action selection strategy avoids local minima in the robot path and improves the algorithm convergence speed. AD3QN provides good dynamic planning and real-time performance and is flexible, robust and accurate. Modeling the actuator actuator maintenance workshop and testing the path planning capability of the network before and after the improvement, simulations show that the time to find the target point the first time is 176% faster with AD3QN than with a general dueling network. This research improves the actuator maintenance efficiency which provides extended observation times.

Key words： FAST actuator deep reinforcement learning dueling network path planning

收稿日期: 2021-12-02 出版日期: 2022-10-19

基金资助:国家重点研发计划项目（2019YFB1312702）

通讯作者: 王启明,研究员,E-mail:qmwang@bao.ac.cn E-mail: qmwang@bao.ac.cn

作者简介: 何启嘉(1996-),男,博士研究生。

引用本文:

何启嘉, 王启明, 李佳璇, 王正佳, 王通. 基于优势竞争网络的转运机器人路径规划[J]. 清华大学学报（自然科学版）, 2022, 62(11): 1751-1757.
HE Qijia, WANG Qiming, LI Jiaxuan, WANG Zhengjia, WANG Tong. Transport robot path planning based on an advantage dueling double deep Q-network. Journal of Tsinghua University(Science and Technology), 2022, 62(11): 1751-1757.

链接本文:

http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2022.26.034 或 http://jst.tsinghuajournals.com/CN/Y2022/V62/I11/1751

[1] NAN R D. Five-hundred-meter aperture spherical radio telescope (FAST)[J]. Science in China Series G:Physics, Mechanics & Astronomy, 2006, 49(2):129-148.
[2] 何启嘉, 王启明, 雷政. 基于改进遗传算法的FAST促动器油缸装配机器人运动学[J]. 科学技术与工程, 2021, 21(19):8072-8078. HE Q J, WANG Q M, LEI Z. Kinematics for oil cylinder assembly robot of FAST actuators based on improved genetic algorithm[J]. Science Technology and Engineering, 2021, 21(19):8072-8078. (in Chinese)
[3] 孙长银, 穆朝絮. 多智能体深度强化学习的若干关键科学问题[J]. 自动化学报, 2020, 46(7):1301-1312. SUN C Y, MU C X. Important scientific problems of multi-agent deep reinforcement learning[J]. Acta Automatica Sinica, 2020, 46(7):1301-1312. (in Chinese)
[4] 王慧, 秦广义, 杨春梅. 定制家具板材搬运AGV路径规划[J]. 包装工程, 2021, 42(17):203-209. WANG H, QIN G Y, YANG C M. AGV path planning of customized furniture plate transportation[J]. Packaging Engineering, 2021, 42(17):203-209. (in Chinese)
[5] 毛国君, 顾世民. 改进的Q-Learning算法及其在路径规划中的应用[J]. 太原理工大学学报, 2021, 52(1):91-97. MAO G J, GU S M. An improved Q-learning algorithm and its application in path planning[J]. Journal of Taiyuan University of Technology, 2021, 52(1):91-97. (in Chinese)
[6] 董瑶, 葛莹莹, 郭鸿湧, 等. 基于深度强化学习的移动机器人路径规划[J]. 计算机工程与应用, 2019, 55(13):15-19, 157. DONG Y, GE Y Y, GUO H Y, et al. Path planning for mobile robot based on deep reinforcement learning[J]. Computer Engineering and Applications, 2019, 55(13):15-19, 157. (in Chinese)
[7] 曾纪钧, 梁哲恒. 监督式强化学习在路径规划中的应用研究[J]. 计算机应用与软件, 2018, 35(10):185-188, 244. ZENG J J, LIANG Z H. Research of path planning based on the supervised reinforcement learning[J]. Computer Applications and Software, 2018, 35(10):185-188, 244. (in Chinese)
[8] 张福海, 李宁, 袁儒鹏, 等. 基于强化学习的机器人路径规划算法[J]. 华中科技大学学报(自然科学版), 2018, 46(12):65-70. ZHANG F H, LI L, YUAN R P, et al. Robot path planning algorithm based on reinforcement learning[J]. Journal of Huazhong University of Science and Technology (Natural Science Edition), 2018, 46(12):65-70. (in Chinese)
[9] WANG Z Y, SCHAUL T, HESSEL M, et al. Dueling network architectures for deep reinforcement learning[C]//Proceedings of the 33rd International Conference on International Conference on Machine Learning. New York, USA:JMLR, 2016:1995-2003.
[10] 徐晓苏, 袁杰. 基于改进强化学习的移动机器人路径规划方法[J]. 中国惯性技术学报, 2019, 27(3):314-320. XU X S, YUAN J. Path planning for mobile robot based on improved reinforcement learning algorithm[J]. Journal of Chinese Inertial Technology, 2019, 27(3):314-320. (in Chinese)
[11] 阮晓钢, 刘鹏飞, 朱晓庆. 基于气味奖励引导的Q-learning环境认知方法[J]. 清华大学学报(自然科学版), 2021, 61(3):254-260. RUAN X G, LIU P F, ZHU X Q. Q-learning environment recognition method based on odor-reward shaping[J]. Journal of Tsinghua University (Science and Technology), 2021, 61(3):254-260. (in Chinese)
[12] MNIH V, AVUKCUOGLU K, SILVER D, et al. Playing Atari with deep reinforcement learning[C]//Proceedings of the Workshops at the 26th Neural Information Processing Systems. Lake Tahoe, USA:MIT Press, 2013:201-220.
[13] VAN HASSELT H, GUEZ A, SILVER D. Deep reinforcement learning with double Q-Learning[C]//Proceedings of the 30th AAAI Conference on Artificial Intelligence. Phoenix, USA:AAAI, 2016:2094-2100.

[1]	毕军, 杜宇佳, 王永兴, 左小龙. 基于用户综合满意度的电动汽车充电诱导优化模型[J]. 清华大学学报（自然科学版）, 2023, 63(11): 1750-1759.
[2]	雷政, 姜鹏, 王启明. FAST促动器故障预测与健康管理系统[J]. 清华大学学报（自然科学版）, 2022, 62(11): 1796-1802.
[3]	崔俊云, 陈迪, 袁野, 马玉亮, 王国仁. 空间众包中在线路径规划算法[J]. 清华大学学报（自然科学版）, 2020, 60(8): 672-682.
[4]	张书玮, 冯桂璇, 樊月珍, 万爽, 罗禹贡. 基于信息交互的大规模电动汽车充电路径规划[J]. 清华大学学报（自然科学版）, 2018, 58(3): 279-285.

Viewed

Full text

Abstract

Cited

Shared

Discussed