Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  百年期刊
Journal of Tsinghua University(Science and Technology)    2021, Vol. 61 Issue (9) : 881-888     DOI: 10.16511/j.cnki.qhdxxb.2020.22.038
INTELLIGENT VEHICLE |
End-to-end self-driving policy based on the deep deterministic policy gradient algorithm considering the state distribution
WANG Tinghan, LUO Yugong, LIU Jinxin, LI Keqiang
State Key Laboratory of Automotive Safety and Energy, Tsinghua University, Beijing 100084, China
Download: PDF(5875 KB)   HTML
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks    
Abstract  End-to-end control is one approach for self-driving. However, end-to-end self-driving methods based on reinforcement learning have troubles dealing with widely varying driving scenarios. The learning algorithm cannot easily determine the random decline velocity during training for complex scenarios. If the velocity is too fast, the algorithm will not obtain a reasonable policy for new scenarios, while if the velocity is too slow, the algorithm will not converge fast. A random policy and experience replay method based on the state distribution is developed here to improve the random decline velocity selection. The various random process parameters are selected based on the distance between the present state and the saved states. In addition, the replay probability of various scenes that occur less frequently is also increased. Simulations show that the algorithm has sufficient exploration ability in the later stage of the training when faced with scenes having very different situations from the situations in the early stages, which improves the lane keeping ability of the end-to-end self-driving approach based on the deep deterministic policy gradient (DDPG) in new situations.
Keywords end-to-end self-driving      reinforcement learning      random policy      experience replay     
Issue Date: 21 August 2021
Service
E-mail this article
E-mail Alert
RSS
Articles by authors
WANG Tinghan
LUO Yugong
LIU Jinxin
LI Keqiang
Cite this article:   
WANG Tinghan,LUO Yugong,LIU Jinxin, et al. End-to-end self-driving policy based on the deep deterministic policy gradient algorithm considering the state distribution[J]. Journal of Tsinghua University(Science and Technology), 2021, 61(9): 881-888.
URL:  
http://jst.tsinghuajournals.com/EN/10.16511/j.cnki.qhdxxb.2020.22.038     OR     http://jst.tsinghuajournals.com/EN/Y2021/V61/I9/881
  
  
  
  
  
  
  
  
  
  
[1] 张新钰, 高洪波, 赵建辉, 等. 基于深度学习的自动驾驶技术综述[J]. 清华大学学报(自然科学版), 2018, 58(4): 438-444. ZHANG X Y, GAO H B, ZHAO J H, et al. Overview of deep learning intelligent driving methods[J]. Journal of Tsinghua University (Science and Technology), 2018, 58(4): 438-444. (in Chinese)
[2] POMERLEAU D A. Alvinn: An autonomous land vehicle in a neural network[M]. San Francisco, USA: Morgan Kaufmann Publishers, 1989: 305-313.
[3] MICHELS J, SAXENA A, NG A Y. High speed obstacle avoidance using monocular vision and reinforcement learning[C]//Proceedings of the 22nd International Conference on Machine Learning. New York, USA, 2005: 593-600.
[4] LANGE S, RIEDMILLER M, VOIGTLÄNDER A. Autonomous reinforcement learning on raw visual input data in a real world application[C]//The 2012 International Joint Conference on Neural Networks (IJCNN). Brisbane, Australia, 2012: 1-8.
[5] JOCHEM T M, POMERLEAU D A, THORPE C E. Vision-based neural network road and intersection detection and traversal[C]//Proceedings of 1995 IEEE/RSJ International Conference on Intelligent Robots and Systems. Pittsburgh, USA, 1995: 344-349.
[6] LAMPLE G, CHAPLOT D S. Playing FPS games with deep reinforcement learning[C]//Proceedings of the 31st AAAI Conference on Artificial Intelligence. San Francisco, USA, 2017: 2140-2146.
[7] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing Atari with deep reinforcement learning[C]//NIPS Deep Learning Workshop. Lake Tahoe, USA, 2013.
[8] GAYA J O, GONÇALVES L T, DUARTE A C, et al. Vision-based obstacle avoidance using deep learning[C]//Proceedings of 2016 XIII Latin American Robotics Symposium and IV Brazilian Robotics Symposium (LARS/SBR). Recife, Brazil, 2016: 7-12.
[9] GIUSTI A, GUZZI J, CIREŞAN D C, et al. A machine learning approach to visual perception of forest trails for mobile robots[J]. IEEE Robotics and Automation Letters, 2016, 1(2): 661-667.
[10] LEVINE S, PASTOR P, KRIZHEVSKY A, et al. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection[J]. The International Journal of Robotics Research, 2018, 37(4-5): 421-436.
[11] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529(7587): 484-489.
[12] SILVER D, BAGNELL J A, STENTZ A. Learning from demonstration for autonomous navigation in complex unstructured terrain[J]. The International Journal of Robotics Research, 2010, 29(12): 1565-1592.
[13] LANGE S, RIEDMILLER M. Deep auto-encoder neural networks in reinforcement learning[C]//The 2010 International Joint Conference on Neural Networks (IJCNN). Barcelona, Spain, 2010: 1-8.
[14] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533.
[15] YOUSSEF F, HOUDA B. Deep reinforcement learning with external control: Self-driving car application[C]//Proceedings of the 4th International Conference on Smart City Applications. Casablanca, Morocco, 2019: 1-7.
[16] CHIANG H T L, FAUST A, FISER M, et al. Learning navigation behaviors end-to-end with AutoRL[Z/OL]. (2018-02-01)[2020-08-09]. https://arxiv.org/abs/1809.10124.
[17] ZOU Q J, XIONG K, HOU Y L. An end-to-end learning of driving strategies based on DDPG and imitation learning[C]//Proceedings of the 2020 Chinese Control and Decision Conference (CCDC). Hefei, China, 2020: 3190-33195.
[18] LIANG X D, WANG T R, YANG L N, et al. CIRL: Controllable imitative reinforcement learning for vision-based self-driving[C]//Proceedings of the European Conference on Computer Vision (ECCV). Cham, Germany: Springer, 2018: 584-599.
[19] CHEN J, WU T, SHI M P, et al. PORF-DDPG: Learning personalized autonomous driving behavior with progressively optimized reward function[J]. Sensors, 2020, 20(19): 5626.
[20] 张斌, 何明, 陈希亮, 等. 改进DDPG算法在自动驾驶中的应用[J]. 计算机工程与应用, 2019, 55(10): 264-270. ZHANG B, HE M, CHEN X L, et al. Self-driving via improved DDPG algorithm[J]. Computer Engineering and Applications, 2019, 55(10): 264-270. (in Chinese)
[21] UHLENBECK G E, ORNSTEIN L S. On the theory of the Brownian motion[J]. Physical Review, 1930, 36(5): 823-841.
[22] 杨顺, 蒋渊德, 吴坚, 等. 基于多类型传感数据的自动驾驶深度强化学习方法[J]. 吉林大学学报(工学版), 2019, 49(4): 1026-1033. YANG S, JIANG D Y, WU J, et al. Autonomous driving policy learning based on deep reinforcement learning and multi-type sensor data[J]. Journal of Jilin University (Engineering and Technology Edition), 2019, 49(4): 1026-1033. (in Chinese)
[23] OU Y N, LIU L F, WEI Q, et al. A novel DDPG method with prioritized experience replay[C]//Proceedings of the 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC). Banff, Canada, 2017: 316-321.
[1] HUANG Hao, MA Wenhui, LI Jiacheng, FANG Yangwang. Intelligent obstacle avoidance control method for unmanned aerial vehicle formations in unknown environments[J]. Journal of Tsinghua University(Science and Technology), 2024, 64(2): 358-369.
[2] HE Qijia, WANG Qiming, LI Jiaxuan, WANG Zhengjia, WANG Tong. Transport robot path planning based on an advantage dueling double deep Q-network[J]. Journal of Tsinghua University(Science and Technology), 2022, 62(11): 1751-1757.
[3] ZENG Daojian, TONG Guowei, DAI Yuan, LI Feng, HAN Bing, XIE Songxian. Keyphrase extraction for legal questions based on a sequence to sequence model[J]. Journal of Tsinghua University(Science and Technology), 2019, 59(4): 256-261.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
Copyright © Journal of Tsinghua University(Science and Technology), All Rights Reserved.
Powered by Beijing Magtech Co. Ltd