面向交叉路口通行的自动驾驶强化学习方法

CASIA OpenIR > 毕业生 > 博士学位论文

	面向交叉路口通行的自动驾驶强化学习方法
	刘育琦
	2024-05-12
页数	136
学位类型	博士
中文摘要	作为现代自动驾驶系统的核心组成部分，驾驶策略的设计方法对于车辆的安全高效行驶有着至关重要的作用。传统的驾驶策略设计方法大多依赖于手工设计规则或启发式算法，因此在处理复杂交互场景时存在许多不足之处，例如严重依赖工程经验、规则系统难以维护等，因此需要寻找更加智能和高效的解决方案。深度强化学习作为一种融合了深度学习与强化学习的方法，同时具备感知、特征提取与决策能力，已在许多复杂的动态决策环境中，例如视频游戏、棋类和机器人控制等领域表现出卓越的性能。通过与环境交互进行训练，深度强化学习能够有效提取环境特征，在试错过程中自主学习并不断改进策略，展现出了良好的环境适应能力。然而，针对自动驾驶问题中的复杂交互场景，目前基于深度强化学习的驾驶决策在安全性、事后可解释性和泛化性等方面仍存在许多挑战。为了应对这些挑战，本文以城市密集交通流下的交叉路口为典型交互场景，研究深度强化学习在复杂环境中的驾驶策略设计问题。本文针对强化学习方法的训练和测试需求，提出了一个交叉路口场景集，包含基于规则和基于强化学习方法的基线算法，为交叉路口通行任务提供了一套可行的测试评价体系；针对自动驾驶策略的安全性和可解释性需求以及路口通行的多任务需求，提出了用于对策略输出的危险动作进行修正的安全层模型、融合注意力机制的策略网络设计方法和融合任务编码的状态表征方法，取得了超过了现有最优方法的实验性能；针对自动驾驶策略的场景泛化问题，提出了基于高精地图的车道图矢量化方法，并在多个泛化测试场景下进行了验证。论文的主要章节包括以下的工作和贡献：面向深度强化学习方法的交叉路口场景集构建。针对在仿真环境中建立面向强化学习的密集交通流下的交叉路口场景集的需求，首先分析了部署场景集的必要性和交叉路口场景集的设计准则，之后从现有标准出发，设计了一组确定性测试场景，接下来针对确定性训练场景，提出了基于随机过程和截断高斯分布的交通流生成方法，此外，还基于仿真器内置的自动驾驶策略构建了随机性训练和测试场景，并在此基础上提出了一组泛化性测试场景用于检验自动驾驶智能体的泛化能力；然后，设计了基于规则的方法和基于强化学习的方法作为基线方法；最后，介绍了场景集所使用的评价指标以及强化学习的训练方法，并详细分析了实验结果，对比了强化学习方法和基于规则的方法在交叉路口场景中的性能表现。基于安全层模型和注意力机制的多任务强化学习方法。针对基于强化学习的自动驾驶智能体在训练和测试的过程中面临的安全性问题，首先，基于碰撞时间的概念扩展出了瞬时碰撞时间的概念并构建了受约束马尔可夫最优化问题的安全约束，通过理论分析给出了安全约束一阶近似条件下的安全动作修正方法，在此基础上训练了安全层网络用于对策略输出的危险动作进行修正；然后，为了提高自动驾驶策略的可解释性，将注意力机制引入了策略网络的设计当中，从而能够自适应地学习关键交互车辆上的注意力权重分配，此外，为了处理路口场景下的多任务通行需求，将任务编码引入强化学习的状态表征构建了多任务训练框架；最后，在两种仿真环境下进行了实验，实验结果表明所提出的方法超过了现有最优方法，还在前一章所提出的场景集下进行了对比实验，证明了本章提出的安全强化学习方法的有效性以及处理多任务场景的能力。基于车道图矢量化状态表征的强化学习方法。针对自动驾驶智能体在未见场景下的驾驶策略泛化问题，首先，对高精地图中的车道信息进行提取，基于离散化的车道节点集建立了无向车道图，考虑车道节点不同连接方式和长距离上的相关性，改进了图卷积运算，一方面按不同类型的邻接矩阵进行图卷积运算，另一方面引入空洞卷积扩大感受野；然后，将车道特征与车辆运动特征基于空间注意力机制进行了融合，并将融合的特征用于强化学习的状态表征，针对交叉路口环境的稀疏奖励问题，引入优势经验回放方法用于改善强化学习智能体的训练；最后，在所提出的场景集下对智能体进行了训练和测试，实验结果证明了所提出方法在泛化性测试场景下的有效性，典型案例分析的结果也表明所提出的方法优于现有最优方法。
英文摘要	As a core component of modern autonomous driving systems, the design of driving strategies plays a crucial role in the safe and efficient operation of vehicles. Traditional methods for driving strategy design mostly rely on manual rule design or heuristic algorithms, which have many shortcomings when dealing with complex interactive scenarios, such as heavy dependence on engineering experience and difficulty in maintaining rule-based systems. Therefore, there is a need to find more intelligent and efficient solutions. Deep reinforcement learning, as a method that integrates deep learning and reinforcement learning, possesses perception, feature extraction, and decision-making capabilities. It has shown exceptional performance in many complex dynamic decision-making environments, such as video games, chess, and robot control. Through training with environment interaction, deep reinforcement learning agent can effectively extract environment features, learn autonomously, and continuously improve strategies through trial and error, demonstrating good adaptability to the environment. However, regarding the complex interactive scenarios in autonomous driving, there are still many challenges in safety, interpretability, and generalization aspects of driving policies based on deep reinforcement learning. In order to address these challenges, this study focuses on the design of driving strategies using deep reinforcement learning in complex environments, with dense urban traffic intersections as typical interactive scenarios. This paper proposes a set of intersection scenarios for training and testing requirements of reinforcement learning methods, which includes baseline algorithms based on rules and reinforcement learning, providing a testing and evaluation system for intersection navigation tasks. In response to the safety and interpretability requirements of autonomous driving strategies, as well as the multi-task requirements of intersection navigation, this paper proposes a safety layer model for correcting dangerous actions in policy outputs, a strategy network design method that integrates the attention mechanism, and a state representation method that integrates task encoding, achieving experimental performance exceeding existing optimal methods. To address the problem of scenario generalization in autonomous driving strategies, a lane graph vectorization method based on high-definition maps is proposed and validated in multiple generalized testing scenarios. The main chapters of this paper include the following work and contributions: Construction of intersection scenario set for deep reinforcement learning methods. To meet the need for establishing a scenario set for reinforcement learning in dense traffic flow at intersections in a simulation environment, the necessity of deploying a scenario set is first analyzed, and the design principles for an intersection scenario set are discussed. Based on existing standards, a set of deterministic testing scenarios is designed. Regarding the design of deterministic training scenarios, a traffic flow generation method based on truncated random processes and truncated Gaussian distributions is proposed. Random training and testing scenarios are created based on the built-in autonomous driving strategies of the simulator. On this basis, a set of generalization testing scenarios is introduced to evaluate the agent's generalization capability. Furthermore, in order to conduct algorithm performance comparisons, rule-based methods and reinforcement learning methods are designed as baseline methods. Finally, the evaluation metrics used in the entire scenario set, as well as the training methods and processes for reinforcement learning, are introduced. The experimental results are analyzed in detail, and the performance of reinforcement learning agents and rule-based methods in intersection scenarios is compared. Multi-task reinforcement learning method based on safety layer model and attention mechanism. To address the safety issues faced by reinforcement learning-based autonomous driving agents during training and testing, firstly, the concept of collision time is extended, introducing the concept of instantaneous collision time and constructing safety constraints for constrained Markovian optimization problems. Through theoretical analysis, a first-order approximate method for safety action correction is proposed, and a safety layer network is trained to modify the dangerous actions generated by the policy. Then, the attention mechanism is introduced into the design of the policy network, enabling adaptive learning of attention weight allocation for key interacting vehicles. By incorporating task encoding into the state representation of reinforcement learning, a multi-task training framework is constructed. Finally, experiments are conducted in two simulation environments. The results show that the proposed method outperforms existing state-of-the-art methods. Comparative experiments are also conducted using the proposed scenario set, demonstrating the effectiveness of the proposed reinforcement learning method in handling multi-task scenarios. Reinforcement learning method based on lane graph vectorization state representation. To address the problem of driving strategy generalization in unseen scenarios for autonomous driving agents, firstly, lane information is extracted from high-definition maps, and an undirected lane graph is established based on a discretized set of lane nodes. Considering the different types of connections between nodes and the long-distance correlations, the graph convolutional network is improved. On the one hand, graph convolution operations are performed based on different adjacency matrices. On the other hand, dilated convolutions are introduced to expand the receptive field of graph convolution operations, aiming to extract relevant features of the lanes. Then, the lane features are fused with vehicle motion features and utilized as the state representation for reinforcement learning. To address the problem of sparse rewards in the environment, the advantage experience replay method is introduced for training the reinforcement learning agent. Finally, the agent is trained and tested on the proposed scenario set. The experimental results confirm the effectiveness of the proposed method, and the results in generalization scenarios demonstrate that the lane vectorization method effectively improves the algorithm's generalization performance. Case studies also show that the proposed method outperforms existing state-of-the-art methods.
关键词	深度强化学习自动驾驶交叉路口通行场景集安全强化学习图卷积网络
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/57121
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	刘育琦. 面向交叉路口通行的自动驾驶强化学习方法[D],2024.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
博士学位论文_刘育琦_签字版.pdf（24247KB）	学位论文		限制开放	CC BY-NC-SA