Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们 横山亮次奖 百年刊庆
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  横山亮次奖  |  百年刊庆
清华大学学报(自然科学版)  2021, Vol. 61 Issue (9): 881-888    DOI: 10.16511/j.cnki.qhdxxb.2020.22.038
  智能汽车 本期目录 | 过刊浏览 | 高级检索 |
基于考虑状态分布的深度确定性策略梯度算法的端到端自动驾驶策略
王庭晗, 罗禹贡, 刘金鑫, 李克强
清华大学 汽车安全与节能国家重点实验室, 北京 100084
End-to-end self-driving policy based on the deep deterministic policy gradient algorithm considering the state distribution
WANG Tinghan, LUO Yugong, LIU Jinxin, LI Keqiang
State Key Laboratory of Automotive Safety and Energy, Tsinghua University, Beijing 100084, China
全文: PDF(5875 KB)   HTML
输出: BibTeX | EndNote (RIS)      
摘要 端到端方法是实现自动驾驶的方法之一,而自动驾驶的场景较为丰富,不同场景的特征差别较大,这使得基于强化学习的端到端自动驾驶方法训练时的随机性衰减速度难以确定,若衰减过快,在面对新场景时无法获得较好的自动驾驶效果,反之则会使得算法难以快速收敛。针对这一问题,该文提出了一种基于输入状态分布筛选的随机策略和经验回放方法,将当前输入的状态数据和已保存的状态数据之间的距离进行对比,根据不同的距离选择不同的随机策略参数,同时在经验回放时提高出现频率较低场景的回放概率。仿真结果表明:该算法在训练后期面对与前期数据分布差异较大的场景时仍有足够的探索能力,提高了基于深度确定性策略梯度算法的端到端自动驾驶策略在全新工况下的车道保持能力。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
王庭晗
罗禹贡
刘金鑫
李克强
关键词 端到端自动驾驶强化学习随机策略经验回放    
Abstract:End-to-end control is one approach for self-driving. However, end-to-end self-driving methods based on reinforcement learning have troubles dealing with widely varying driving scenarios. The learning algorithm cannot easily determine the random decline velocity during training for complex scenarios. If the velocity is too fast, the algorithm will not obtain a reasonable policy for new scenarios, while if the velocity is too slow, the algorithm will not converge fast. A random policy and experience replay method based on the state distribution is developed here to improve the random decline velocity selection. The various random process parameters are selected based on the distance between the present state and the saved states. In addition, the replay probability of various scenes that occur less frequently is also increased. Simulations show that the algorithm has sufficient exploration ability in the later stage of the training when faced with scenes having very different situations from the situations in the early stages, which improves the lane keeping ability of the end-to-end self-driving approach based on the deep deterministic policy gradient (DDPG) in new situations.
Key wordsend-to-end self-driving    reinforcement learning    random policy    experience replay
收稿日期: 2020-09-08      出版日期: 2021-08-21
基金资助:国家自然科学基金项目(51975310);国家重点研发计划(2016YFB0100905)
通讯作者: 李克强,教授,E-mail:likq@tsinghua.edu.cn     E-mail: likq@tsinghua.edu.cn
引用本文:   
王庭晗, 罗禹贡, 刘金鑫, 李克强. 基于考虑状态分布的深度确定性策略梯度算法的端到端自动驾驶策略[J]. 清华大学学报(自然科学版), 2021, 61(9): 881-888.
WANG Tinghan, LUO Yugong, LIU Jinxin, LI Keqiang. End-to-end self-driving policy based on the deep deterministic policy gradient algorithm considering the state distribution. Journal of Tsinghua University(Science and Technology), 2021, 61(9): 881-888.
链接本文:  
http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2020.22.038  或          http://jst.tsinghuajournals.com/CN/Y2021/V61/I9/881
  
  
  
  
  
  
  
  
  
  
[1] 张新钰, 高洪波, 赵建辉, 等. 基于深度学习的自动驾驶技术综述[J]. 清华大学学报(自然科学版), 2018, 58(4): 438-444. ZHANG X Y, GAO H B, ZHAO J H, et al. Overview of deep learning intelligent driving methods[J]. Journal of Tsinghua University (Science and Technology), 2018, 58(4): 438-444. (in Chinese)
[2] POMERLEAU D A. Alvinn: An autonomous land vehicle in a neural network[M]. San Francisco, USA: Morgan Kaufmann Publishers, 1989: 305-313.
[3] MICHELS J, SAXENA A, NG A Y. High speed obstacle avoidance using monocular vision and reinforcement learning[C]//Proceedings of the 22nd International Conference on Machine Learning. New York, USA, 2005: 593-600.
[4] LANGE S, RIEDMILLER M, VOIGTLÄNDER A. Autonomous reinforcement learning on raw visual input data in a real world application[C]//The 2012 International Joint Conference on Neural Networks (IJCNN). Brisbane, Australia, 2012: 1-8.
[5] JOCHEM T M, POMERLEAU D A, THORPE C E. Vision-based neural network road and intersection detection and traversal[C]//Proceedings of 1995 IEEE/RSJ International Conference on Intelligent Robots and Systems. Pittsburgh, USA, 1995: 344-349.
[6] LAMPLE G, CHAPLOT D S. Playing FPS games with deep reinforcement learning[C]//Proceedings of the 31st AAAI Conference on Artificial Intelligence. San Francisco, USA, 2017: 2140-2146.
[7] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing Atari with deep reinforcement learning[C]//NIPS Deep Learning Workshop. Lake Tahoe, USA, 2013.
[8] GAYA J O, GONÇALVES L T, DUARTE A C, et al. Vision-based obstacle avoidance using deep learning[C]//Proceedings of 2016 XIII Latin American Robotics Symposium and IV Brazilian Robotics Symposium (LARS/SBR). Recife, Brazil, 2016: 7-12.
[9] GIUSTI A, GUZZI J, CIREŞAN D C, et al. A machine learning approach to visual perception of forest trails for mobile robots[J]. IEEE Robotics and Automation Letters, 2016, 1(2): 661-667.
[10] LEVINE S, PASTOR P, KRIZHEVSKY A, et al. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection[J]. The International Journal of Robotics Research, 2018, 37(4-5): 421-436.
[11] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529(7587): 484-489.
[12] SILVER D, BAGNELL J A, STENTZ A. Learning from demonstration for autonomous navigation in complex unstructured terrain[J]. The International Journal of Robotics Research, 2010, 29(12): 1565-1592.
[13] LANGE S, RIEDMILLER M. Deep auto-encoder neural networks in reinforcement learning[C]//The 2010 International Joint Conference on Neural Networks (IJCNN). Barcelona, Spain, 2010: 1-8.
[14] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533.
[15] YOUSSEF F, HOUDA B. Deep reinforcement learning with external control: Self-driving car application[C]//Proceedings of the 4th International Conference on Smart City Applications. Casablanca, Morocco, 2019: 1-7.
[16] CHIANG H T L, FAUST A, FISER M, et al. Learning navigation behaviors end-to-end with AutoRL[Z/OL]. (2018-02-01)[2020-08-09]. https://arxiv.org/abs/1809.10124.
[17] ZOU Q J, XIONG K, HOU Y L. An end-to-end learning of driving strategies based on DDPG and imitation learning[C]//Proceedings of the 2020 Chinese Control and Decision Conference (CCDC). Hefei, China, 2020: 3190-33195.
[18] LIANG X D, WANG T R, YANG L N, et al. CIRL: Controllable imitative reinforcement learning for vision-based self-driving[C]//Proceedings of the European Conference on Computer Vision (ECCV). Cham, Germany: Springer, 2018: 584-599.
[19] CHEN J, WU T, SHI M P, et al. PORF-DDPG: Learning personalized autonomous driving behavior with progressively optimized reward function[J]. Sensors, 2020, 20(19): 5626.
[20] 张斌, 何明, 陈希亮, 等. 改进DDPG算法在自动驾驶中的应用[J]. 计算机工程与应用, 2019, 55(10): 264-270. ZHANG B, HE M, CHEN X L, et al. Self-driving via improved DDPG algorithm[J]. Computer Engineering and Applications, 2019, 55(10): 264-270. (in Chinese)
[21] UHLENBECK G E, ORNSTEIN L S. On the theory of the Brownian motion[J]. Physical Review, 1930, 36(5): 823-841.
[22] 杨顺, 蒋渊德, 吴坚, 等. 基于多类型传感数据的自动驾驶深度强化学习方法[J]. 吉林大学学报(工学版), 2019, 49(4): 1026-1033. YANG S, JIANG D Y, WU J, et al. Autonomous driving policy learning based on deep reinforcement learning and multi-type sensor data[J]. Journal of Jilin University (Engineering and Technology Edition), 2019, 49(4): 1026-1033. (in Chinese)
[23] OU Y N, LIU L F, WEI Q, et al. A novel DDPG method with prioritized experience replay[C]//Proceedings of the 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC). Banff, Canada, 2017: 316-321.
[1] 何启嘉, 王启明, 李佳璇, 王正佳, 王通. 基于优势竞争网络的转运机器人路径规划[J]. 清华大学学报(自然科学版), 2022, 62(11): 1751-1757.
[2] 曾道建, 童国维, 戴愿, 李峰, 韩冰, 谢松县. 基于序列到序列模型的法律问题关键词抽取[J]. 清华大学学报(自然科学版), 2019, 59(4): 256-261.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 《清华大学学报(自然科学版)》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn