基于深度强化学习的超车换道决策方法
王俊杰
2023-05-23
页数124
学位类型博士
中文摘要

作为模块化智能驾驶系统的核心组成部分,智能驾驶决策对于提升道路交通安全和行驶效率具有至关重要的作用。然而,传统的决策方法常常依赖于手工设计的规则和启发式算法,在处理复杂场景时存在许多不足之处,例如严重依赖工程经验、维护困难等,因此亟需寻找更加高效的解决方案。深度强化学习作为一种融合了深度学习与强化学习的方法,具备强大的感知与决策能力,已在许多复杂决策环境中表现出卓越的性能。通过数据驱动的训练,深度强化学习能够有效提取环境特征,在与环境交互中自主学习和优化策略,展现出良好的环境适应能力。尽管如此,针对复杂交互场景,当前基于深度强化学习的驾驶决策在安全性、效率性和泛化性等方面仍存在许多亟待解决的挑战。为了应对这些挑战,本文以超车换道为典型交互场景,研究深度强化学习在复杂环境中的决策问题。

本文针对决策输出的安全性和效率性问题,提出了规则约束的深度强化学习决策方法,设计了结合向量注意力和图像注意力的双表征方法,并构建了超车换道任务的测试评价体系;针对无模型方法样本利用率低的问题,引入基于模型强化学习,并针对模型的不确定性问题提出了动态视域值扩展方法;针对基于模型强化学习的动力学泛化问题,提出了原型情境感知的动力学泛化方法,并在不同交通流密度和天气条件的超车换道场景下对其零样本泛化能力进行了验证。

论文主要章节包含以下工作和贡献:

无模型强化学习超车换道方法与测试评价。针对决策输出的安全性和效率性问题,首先,为保障决策输出的安全性,提出了规则约束的深度强化学习决策方法框架,利用规则对不合理的学习决策输出进行修正,相比于单独使用规则决策或学习决策,该框架能够提高安全性;然后,为提升状态表征的信息量,提出了结合向量注意力和图像注意力的双表征融合方法框架,在这种融合表征下能够取得相比于单独使用某一种形式的状态表征更好的决策性能;最后,为统一各种方法的性能对比,建立了一套针对超车换道场景的强化学习训练、测试、评价体系,设计了若干典型案例,并在所构建的评价体系中对所提方法进行了测试和评估。

基于模型的动态视域值扩展超车换道方法。针对基于模型方法中存在的模型不确定性问题,首先,从实验上分析了推理视域对基于模型强化学习性能的影响;其次,提出一种基于模型的动态视域值扩展方法框架,包括利用世界模型进行潜空间想象和动态推理视域的值扩展,并从理论上分析了在模型存在不确定性情况下,强化学习目标值估计与推理视域的关系;然后,提出了一种模型推理视域可信程度检测方法,通过重建图像与原始图像在各推理视域的值扩展误差来反映视域可信度,从而实现世界模型推理视域的自适应动态调整;最后,通过在基准视觉控制任务上进行实验,验证了所提方法相较于现有技术具有更高的采样效率和更优的最终性能。此外,在超车换道决策评价体系下的测试结果也进一步验证了该方法的有效性和可扩展性,相比先进无模型和基于模型方法能够取得多项换道性能指标上的提升。

原型情境感知的动力学泛化超车换道方法。针对模型的动力学泛化问题,首先,形式化了基于高维输入的强化学习的泛化性问题;其次,提出了原型情境感知的动力学模型,通过引入时序一致的原型正则器来优化潜空间世界模型,从而使潜空间模型能够学习到更有效的动力学泛化表征;然后,设计了一种新颖的环境情境表征方法,通过投影网络输出与原型匹配的概率将学到的原型进行加权,并将其与潜状态和投影网络输出整合为环境情境特征,验证了这种特征在策略与值网络中的有效性;最后,构建了一系列具有不同动力学的基准环境,并将其划分成训练集和测试集,实验证明了所提方法在零样本视觉控制任务上具有优越的泛化性能,在自动驾驶超车换道泛化任务(包括不同交通流密度和不同天气条件)中的实验进一步验证了该方法的泛化性能优势,极端天气条件下的性能对比结果也显示出其对复杂环境的泛化处理能力。

英文摘要

As a core component of modular intelligent driving systems, intelligent driving decision-making plays a crucial role in improving road traffic safety and driving efficiency. However, traditional methods rely on manual rules and heuristics, which struggle with complex scenarios, such as heavy dependence on engineering experience and difficulty in maintenance. Therefore, more efficient solutions are urgently needed. Deep reinforcement learning, combining deep learning and reinforcement learning, offers strong perception and decision-making capabilities and has demonstrated excellent performance in many complex decision-making environments. It effectively extracts environmental features, learns and optimizes policies autonomously, and exhibits adaptability through data-driven training. Despite this, for complex interaction scenarios, there are still many challenges to be addressed in terms of safety, efficiency, and generalization for driving decision-making based on deep reinforcement learning. To tackle these challenges, this thesis focuses on lane-changing as typical interactive scenarios and investigates decision-making problems in complex environments using deep reinforcement learning. To address safety and efficiency issues in decision output, this thesis proposes a rule-constrained deep reinforcement learning decision-making method, designs a dual representation method combining vector-based attention and image-based attention, and constructs a testing and evaluation system for lane-changing tasks. To address the low sample efficiency problem of model-free methods, model-based reinforcement learning is introduced, and a dynamic-horizon value expansion method is proposed to address model uncertainty issues. To address the dynamics generalization problem in model-based reinforcement learning, a prototypical context-aware dynamics generalization method is proposed and validated for zero-shot generalization capability in lane-changing scenarios with different traffic densities and weather conditions. The main chapters of this thesis include the following work and contributions:

Model-free reinforcement learning for lane changing: methods and evaluation, addressing the safety and efficiency issues of decision-making output. A rule-constrained deep reinforcement learning decision-making framework is proposed, which improves safety compared to using rule-based or learning-based decisions alone. A dual representation fusion method framework is proposed to enhance the information content of state representation, which achieves better decision-making performance compared to using a single form of state representation alone. Also, a reinforcement learning training, testing, and evaluation system is established for lane-changing scenarios, and some typical cases are designed to test and evaluate the proposed methods within the constructed evaluation system.

Model-based reinforcement learning for lane changing: dynamic-horizon value expansion, addressing the model uncertainty issue in model-based methods. The influence of the rollout horizon on the performance of model-based reinforcement learning is analyzed experimentally. A model-based value expansion method is proposed, including the use of a world model for latent space imagination and dynamic-horizon value expansion, and the impact of the rollout horizon on the reinforcement learning value estimation under model uncertainty is analyzed theoretically. A model rollout horizon reliability degree detection method is proposed. The reliability degree of each rollot horizon is reflected by the value expansion error between the reconstructed and original images at each imagination step, thus achieving adaptive dynamic adjustment of the world model’s rollout horizon. The proposed method is experimentally validated to have higher sample efficiency and better final performance than existing technologies. The test results in the lane-changing decision-making evaluation system further verify the effectiveness and scalability of the method, which achieves improvements in multiple lane-changing performance indicators compared to advanced model-free and model-based methods.

Model-based reinforcement learning for lane changing: prototypical context-aware dynamics generalization, addressing the dynamics generalization problem of the model. The generalization problem of high-dimensional input-based reinforcement learning is formalized. A prototypical context-aware dynamics model is proposed, which optimizes the latent space world model by introducing a temporally consistent prototypes regularizer, enabling the latent space model to learn more effective dynamics generalization representations. A novel environmental context representation is designed, and its effectiveness in policy and value networks is verified. The proposed method is experimentally proven to have superior generalization performance in zero-shot visual control tasks, and its generalization performance advantage is further validated in autonomous driving lane-changing generalization tasks (including different traffic flow densities and weather conditions). The performance comparison results under extreme weather conditions also demonstrate its generalization processing capabilities for complex environments.

关键词深度强化学习,自动驾驶,换道决策,基于模型值扩展,动力学泛化
语种中文
七大方向——子方向分类强化与进化学习
国重实验室规划方向分类智能计算与学习
是否有论文关联数据集需要存交
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/52166
专题多模态人工智能系统全国重点实验室_深度强化学习
多模态人工智能系统全国重点实验室
通讯作者王俊杰
推荐引用方式
GB/T 7714
王俊杰. 基于深度强化学习的超车换道决策方法[D],2023.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
学位论文_王俊杰_签字.pdf(17475KB)学位论文 开放获取CC BY-NC-SA
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[王俊杰]的文章
百度学术
百度学术中相似的文章
[王俊杰]的文章
必应学术
必应学术中相似的文章
[王俊杰]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。