基于优化并行的四足机器人运动技能学习

张思远, 朱晓庆, 陈江涛, 刘鑫源, 王涛

清华大学学报(自然科学版) ›› 2024, Vol. 64 ›› Issue (10) : 1696-1705.

PDF(8161 KB)
PDF(8161 KB)
清华大学学报(自然科学版) ›› 2024, Vol. 64 ›› Issue (10) : 1696-1705. DOI: 10.16511/j.cnki.qhdxxb.2024.27.018
专题:机器人工程

基于优化并行的四足机器人运动技能学习

  • 张思远1,2, 朱晓庆1,2, 陈江涛1,2, 刘鑫源1,2, 王涛1,2
作者信息 +

Optimization-based parallel learning of quadruped robot locomotion skills

  • ZHANG Siyuan1,2, ZHU Xiaoqing1,2, CHEN Jiangtao1,2, LIU Xinyuan1,2, WANG Tao1,2
Author information +
文章历史 +

摘要

动物对自然界的适应能力是由环境选择与适者生存决定的, 四足哺乳动物可以通过种群的进化逐步适应环境的变化, 提高其对环境的适应度和种群的生存率。基于上述启发, 该文在软演员-评论家(SAC)算法基础上提出一种基于优化并行强化学习的算法OP-SAC, 该算法使用进化策略与强化学习交替训练, 通过知识共享和知识继承优化四足机器人学习效果, 提高训练效率。算法验证结果显示, OP-SAC算法能够完成四足机器人的仿生步态学习; 对比实验验证出OP-SAC算法比其他结合了进化策略的SAC算法具有更加优越的鲁棒性; 设计消融实验证明了知识共享和知识继承策略使算法的训练效果获得较大提升。

Abstract

[Objective] Inspired by the skill learning of quadruped animals in nature, deep reinforcement learning has been widely applied to learn the quadruped robot locomotion skill. Through interaction with the environment, robots can autonomously learn complete motion strategies. However, traditional reinforcement learning has several drawbacks, such as large computational requirements, slow convergence of algorithms, and rigid learning strategies, which substantially reduce training efficiency and generate unnecessary time costs. To address these shortcomings, this paper introduces evolutionary strategies into the soft actor-critic (SAC) algorithm, proposing an optimized parallel SAC (OP-SAC) algorithm for the parallel training of quadruped robots using evolutionary strategies and reinforcement learning. [Methods] The algorithm first uses a variant temperature coefficient SAC algorithm to reduce the impact of hyperparameter temperature coefficients on the training process and then introduces evolutionary strategies using the reference trajectory trained by the evolutionary strategy as a sample input to guide the training direction of the SAC algorithm. Additionally, the state information and reward values obtained from SAC algorithm training are used as inputs and offspring selection thresholds for the evolutionary strategy, achieving the decoupling of training data. Furthermore, the algorithm adopts an alternating training approach, introducing a knowledge-sharing strategy where the training results of the evolutionary strategy and reinforcement learning are stored in a common experience pool. Furthermore, a knowledge inheritance mechanism is introduced, allowing the training results of both strategies to be passed on to the next stage of the algorithm. With these two training strategies, the evolutionary strategy and reinforcement learning can guide each other in terms of the training direction and pass useful information between different generations, thereby accelerating the learning process and enhancing the robustness of the algorithm. [Results] The simulation experiment results were as follows: 1) Using the OP-SAC algorithm to train quadruped robots achieves a reward value convergence of approximately 3 [KG-*7]000, with stable posture and high speed after training completion. The algorithm can effectively complete the bionic gait learning of quadruped robots. 2) Compared with other algorithms combining SAC and evolutionary strategies, the OP-SAC algorithm has a faster convergence rate and higher reward value after convergence, demonstrating higher robustness in the learned strategies. 3) Although the convergence speed of the OP-SAC algorithm is slower than that of other reinforcement learning algorithms combined with central pattern generator, it ultimately achieves a higher reward value and more stable training results. 4) The ablation experiments confirm the importance of knowledge inheritance and sharing strategies for improving training effectiveness. [Conclusions] The above analysis shows that the proposed OP-SAC algorithm accomplishes the learning of quadruped robot locomotion skill, improves the convergence speed of reinforcement learning to a certain extent, optimizes learning strategies, and significantly enhances training efficiency.

关键词

生物进化 / 强化学习 / 四足机器人 / 进化策略

Key words

biological evolution / reinforcement learning / quadruped robot / evolutionary strategy

引用本文

导出引用
张思远, 朱晓庆, 陈江涛, 刘鑫源, 王涛. 基于优化并行的四足机器人运动技能学习[J]. 清华大学学报(自然科学版). 2024, 64(10): 1696-1705 https://doi.org/10.16511/j.cnki.qhdxxb.2024.27.018
ZHANG Siyuan, ZHU Xiaoqing, CHEN Jiangtao, LIU Xinyuan, WANG Tao. Optimization-based parallel learning of quadruped robot locomotion skills[J]. Journal of Tsinghua University(Science and Technology). 2024, 64(10): 1696-1705 https://doi.org/10.16511/j.cnki.qhdxxb.2024.27.018

参考文献

[1] 付雯, 温浩, 黄俊珲, 等. 基于非线性动力学模型补偿的水下机械臂自适应滑模控制[J]. 清华大学学报(自然科学版), 2023, 63(7): 1068-1077. FU W, WEN H, HUANG J H, et al. Adaptive sliding mode control of underwater manipulator based on nonlinear dynamics model compensation [J]. Journal of Tsinghua University (Science and Technology), 2023, 63(7): 1068-1077. (in Chinese)
[2] 刘志, 陈恳, 徐静. 基于模型和数据驱动的机器人6D位姿估计方法[J]. 清华大学学报(自然科学版), 2022, 62(3): 391-399. LIU Z, CHEN K, XU J. Data-driven method for 6D robot pose estimation [J]. Journal of Tsinghua University (Science and Technology), 2022, 62(3): 391-399. (in Chinese)
[3] MCGHEE R B. Finite state control of quadruped locomotion [J]. Simulation, 1967, 9(3): 135-140.
[4] DI CARLO J, WENSING P M, KATZ B, et al. Dynamic locomotion in the MIT cheetah 3 through convex model-predictive control [C]//2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Madrid, Spain: IEEE, 2018: 1-9.
[5] BLEDT G, POWELL M J, KATZ B, et al. MIT cheetah 3: Design and control of a robust, dynamic quadruped robot [C]//2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Madrid, Spain: IEEE, 2019: 2245-2252.
[6] LEE J, HWANGBO J, WELLHAUSEN L, et al. Learning quadrupedal locomotion over challenging terrain [J]. Science Robotics, 2020, 5(47): eabc5986.
[7] KIM J, BA D X, YEOM H, et al. Gait optimization of a quadruped robot using evolutionary computation [J]. Journal of Bionic Engineering, 2021, 18(2): 306-318.
[8] YIN F L, TANG A N, XU L W, et al. Run like a dog: Learning based whole-body control framework for quadruped gait style transfer [C]//2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Prague, Czech Republic: IEEE, 2021: 8508-8514.
[9] SHENG J P, CHEN Y Y, FANG X, et al. Bio-inspired rhythmic locomotion for quadruped robots [J]. IEEE Robotics and Automation Letters, 2022, 7(3): 6782-6789.
[10] 赵铭慧, 张雪波, 郭宪, 等. 基于分层强化学习的通用装配序列规划算法[J]. 控制与决策, 2022, 37(4): 861-870. ZHAO M H, ZHANG X B, GUO X, et al. A general assembly sequence planning algorithm based on hierarchical reinforcement learning [J]. Control and Decision, 2022, 37(4): 861-870. (in Chinese)
[11] 韩连强, 陈学超, 余张国, 等. 面向离散地形的欠驱动双足机器人平衡控制方法[J]. 自动化学报, 2022, 48(9): 2164-2174. HAN L Q, CHEN X C, YU Z G, et al. Balance control of underactuated biped robot for discrete terrain [J]. Acta Automatica Sinica, 2022, 48(9): 2164-2174. (in Chinese)
[12] ZHANG X Y, GUO X, FANG Y C, et al. Reinforcement learning-based hierarchical control for path following of a salamander-like robot [C]//2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Las Vegas, USA: IEEE, 2020: 6077-6083.
[13] SHI H J, ZHOU B, ZENG H S, et al. Reinforcement learning with evolutionary trajectory generator: A general approach for quadrupedal locomotion [J]. IEEE Robotics and Automation Letters, 2022, 7(2): 3085-3092.
[14] SHEN Y O, GAO X, LIANG Z W. Covariance matrix adaptation for multi-agent proximal policy optimization [C]//202335th Chinese Control and Decision Conference (CCDC). Yichang, China: IEEE, 2023: 4847-4852.
[15] SEHGAL A, WARD N, LA H M, et al. GA+DDPG+HER: Genetic algorithm-based function optimizer in deep reinforcement learning for robotic manipulation tasks [C]//2022 Sixth IEEE International Conference on Robotic Computing (IRC). Naples, Italy: IEEE, 2022: 85-86.
[16] DE SANT ANA P M, MARCHENKO N. Radio access scheduling using CMA-ES for optimized QoS in wireless networks [C]//2020 IEEE Globecom Workshops (GC Wkshps). Taipei, China: IEEE, 2020: 1-6.
[17] WONG C C, CHIEN S Y, FENG H M, et al. Motion planning for dual-arm robot based on soft actor-critic [J]. IEEE Access, 2021, 9: 26871-26885.
[18] TANG H L, WANG A Q, XUE F, et al. A novel hierarchical soft actor-critic algorithm for multi-logistics robots task allocation [J]. IEEE Access, 2021, 9: 42568-42582.
[19] SHARMA K, SINGH B, HERMAN E, et al. Maximum information measure policies in reinforcement learning with deep energy-based model [C]//2021 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE). Dubai, United Arab Emirates: IEEE, 2021: 19-24.
[20] TSOUNIS V, ALGE M, LEE J, et al. DeepGait: Planning and control of quadrupedal gaits using deep reinforcement learning [J]. IEEE Robotics and Automation Letters, 2020, 5(2): 3699-3706.

基金

国家自然科学基金青年科学基金项目(62103009);北京市自然科学基金面上项目(4202005)

PDF(8161 KB)

审稿意见

Accesses

Citation

Detail

段落导航
相关文章

/