Abstract:End-to-end control is one approach for self-driving. However, end-to-end self-driving methods based on reinforcement learning have troubles dealing with widely varying driving scenarios. The learning algorithm cannot easily determine the random decline velocity during training for complex scenarios. If the velocity is too fast, the algorithm will not obtain a reasonable policy for new scenarios, while if the velocity is too slow, the algorithm will not converge fast. A random policy and experience replay method based on the state distribution is developed here to improve the random decline velocity selection. The various random process parameters are selected based on the distance between the present state and the saved states. In addition, the replay probability of various scenes that occur less frequently is also increased. Simulations show that the algorithm has sufficient exploration ability in the later stage of the training when faced with scenes having very different situations from the situations in the early stages, which improves the lane keeping ability of the end-to-end self-driving approach based on the deep deterministic policy gradient (DDPG) in new situations.
王庭晗, 罗禹贡, 刘金鑫, 李克强. 基于考虑状态分布的深度确定性策略梯度算法的端到端自动驾驶策略[J]. 清华大学学报(自然科学版), 2021, 61(9): 881-888.
WANG Tinghan, LUO Yugong, LIU Jinxin, LI Keqiang. End-to-end self-driving policy based on the deep deterministic policy gradient algorithm considering the state distribution. Journal of Tsinghua University(Science and Technology), 2021, 61(9): 881-888.
[1] 张新钰, 高洪波, 赵建辉, 等. 基于深度学习的自动驾驶技术综述[J]. 清华大学学报(自然科学版), 2018, 58(4): 438-444. ZHANG X Y, GAO H B, ZHAO J H, et al. Overview of deep learning intelligent driving methods[J]. Journal of Tsinghua University (Science and Technology), 2018, 58(4): 438-444. (in Chinese) [2] POMERLEAU D A. Alvinn: An autonomous land vehicle in a neural network[M]. San Francisco, USA: Morgan Kaufmann Publishers, 1989: 305-313. [3] MICHELS J, SAXENA A, NG A Y. High speed obstacle avoidance using monocular vision and reinforcement learning[C]//Proceedings of the 22nd International Conference on Machine Learning. New York, USA, 2005: 593-600. [4] LANGE S, RIEDMILLER M, VOIGTLÄNDER A. Autonomous reinforcement learning on raw visual input data in a real world application[C]//The 2012 International Joint Conference on Neural Networks (IJCNN). Brisbane, Australia, 2012: 1-8. [5] JOCHEM T M, POMERLEAU D A, THORPE C E. Vision-based neural network road and intersection detection and traversal[C]//Proceedings of 1995 IEEE/RSJ International Conference on Intelligent Robots and Systems. Pittsburgh, USA, 1995: 344-349. [6] LAMPLE G, CHAPLOT D S. Playing FPS games with deep reinforcement learning[C]//Proceedings of the 31st AAAI Conference on Artificial Intelligence. San Francisco, USA, 2017: 2140-2146. [7] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing Atari with deep reinforcement learning[C]//NIPS Deep Learning Workshop. Lake Tahoe, USA, 2013. [8] GAYA J O, GONÇALVES L T, DUARTE A C, et al. Vision-based obstacle avoidance using deep learning[C]//Proceedings of 2016 XIII Latin American Robotics Symposium and IV Brazilian Robotics Symposium (LARS/SBR). Recife, Brazil, 2016: 7-12. [9] GIUSTI A, GUZZI J, CIREŞAN D C, et al. A machine learning approach to visual perception of forest trails for mobile robots[J]. IEEE Robotics and Automation Letters, 2016, 1(2): 661-667. [10] LEVINE S, PASTOR P, KRIZHEVSKY A, et al. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection[J]. The International Journal of Robotics Research, 2018, 37(4-5): 421-436. [11] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529(7587): 484-489. [12] SILVER D, BAGNELL J A, STENTZ A. Learning from demonstration for autonomous navigation in complex unstructured terrain[J]. The International Journal of Robotics Research, 2010, 29(12): 1565-1592. [13] LANGE S, RIEDMILLER M. Deep auto-encoder neural networks in reinforcement learning[C]//The 2010 International Joint Conference on Neural Networks (IJCNN). Barcelona, Spain, 2010: 1-8. [14] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533. [15] YOUSSEF F, HOUDA B. Deep reinforcement learning with external control: Self-driving car application[C]//Proceedings of the 4th International Conference on Smart City Applications. Casablanca, Morocco, 2019: 1-7. [16] CHIANG H T L, FAUST A, FISER M, et al. Learning navigation behaviors end-to-end with AutoRL[Z/OL]. (2018-02-01)[2020-08-09]. https://arxiv.org/abs/1809.10124. [17] ZOU Q J, XIONG K, HOU Y L. An end-to-end learning of driving strategies based on DDPG and imitation learning[C]//Proceedings of the 2020 Chinese Control and Decision Conference (CCDC). Hefei, China, 2020: 3190-33195. [18] LIANG X D, WANG T R, YANG L N, et al. CIRL: Controllable imitative reinforcement learning for vision-based self-driving[C]//Proceedings of the European Conference on Computer Vision (ECCV). Cham, Germany: Springer, 2018: 584-599. [19] CHEN J, WU T, SHI M P, et al. PORF-DDPG: Learning personalized autonomous driving behavior with progressively optimized reward function[J]. Sensors, 2020, 20(19): 5626. [20] 张斌, 何明, 陈希亮, 等. 改进DDPG算法在自动驾驶中的应用[J]. 计算机工程与应用, 2019, 55(10): 264-270. ZHANG B, HE M, CHEN X L, et al. Self-driving via improved DDPG algorithm[J]. Computer Engineering and Applications, 2019, 55(10): 264-270. (in Chinese) [21] UHLENBECK G E, ORNSTEIN L S. On the theory of the Brownian motion[J]. Physical Review, 1930, 36(5): 823-841. [22] 杨顺, 蒋渊德, 吴坚, 等. 基于多类型传感数据的自动驾驶深度强化学习方法[J]. 吉林大学学报(工学版), 2019, 49(4): 1026-1033. YANG S, JIANG D Y, WU J, et al. Autonomous driving policy learning based on deep reinforcement learning and multi-type sensor data[J]. Journal of Jilin University (Engineering and Technology Edition), 2019, 49(4): 1026-1033. (in Chinese) [23] OU Y N, LIU L F, WEI Q, et al. A novel DDPG method with prioritized experience replay[C]//Proceedings of the 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC). Banff, Canada, 2017: 316-321.