清华大学学报(自然科学版)  2021, Vol. 61 Issue (9): 881-888    DOI: 10.16511/j.cnki.qhdxxb.2020.22.038
王庭晗, 罗禹贡, 刘金鑫, 李克强
清华大学 汽车安全与节能国家重点实验室, 北京 100084
End-to-end self-driving policy based on the deep deterministic policy gradient algorithm considering the state distribution
WANG Tinghan, LUO Yugong, LIU Jinxin, LI Keqiang
State Key Laboratory of Automotive Safety and Energy, Tsinghua University, Beijing 100084, China
摘要 端到端方法是实现自动驾驶的方法之一,而自动驾驶的场景较为丰富,不同场景的特征差别较大,这使得基于强化学习的端到端自动驾驶方法训练时的随机性衰减速度难以确定,若衰减过快,在面对新场景时无法获得较好的自动驾驶效果,反之则会使得算法难以快速收敛。针对这一问题,该文提出了一种基于输入状态分布筛选的随机策略和经验回放方法,将当前输入的状态数据和已保存的状态数据之间的距离进行对比,根据不同的距离选择不同的随机策略参数,同时在经验回放时提高出现频率较低场景的回放概率。仿真结果表明:该算法在训练后期面对与前期数据分布差异较大的场景时仍有足够的探索能力,提高了基于深度确定性策略梯度算法的端到端自动驾驶策略在全新工况下的车道保持能力。
关键词 端到端自动驾驶强化学习随机策略经验回放    
Abstract:End-to-end control is one approach for self-driving. However, end-to-end self-driving methods based on reinforcement learning have troubles dealing with widely varying driving scenarios. The learning algorithm cannot easily determine the random decline velocity during training for complex scenarios. If the velocity is too fast, the algorithm will not obtain a reasonable policy for new scenarios, while if the velocity is too slow, the algorithm will not converge fast. A random policy and experience replay method based on the state distribution is developed here to improve the random decline velocity selection. The various random process parameters are selected based on the distance between the present state and the saved states. In addition, the replay probability of various scenes that occur less frequently is also increased. Simulations show that the algorithm has sufficient exploration ability in the later stage of the training when faced with scenes having very different situations from the situations in the early stages, which improves the lane keeping ability of the end-to-end self-driving approach based on the deep deterministic policy gradient (DDPG) in new situations.
Key wordsend-to-end self-driving    reinforcement learning    random policy    experience replay
收稿日期: 2020-09-08      出版日期: 2021-08-21
通讯作者: 李克强,教授,     E-mail:
王庭晗, 罗禹贡, 刘金鑫, 李克强. 基于考虑状态分布的深度确定性策略梯度算法的端到端自动驾驶策略[J]. 清华大学学报(自然科学版), 2021, 61(9): 881-888.
WANG Tinghan, LUO Yugong, LIU Jinxin, LI Keqiang. End-to-end self-driving policy based on the deep deterministic policy gradient algorithm considering the state distribution. Journal of Tsinghua University(Science and Technology), 2021, 61(9): 881-888.
