CASIA OpenIR  > 复杂系统管理与控制国家重点实验室  > 深度强化学习
面向智能驾驶视觉控制的深度强化学习方法
李栋
Subtype博士
Thesis Advisor赵冬斌
2019-05-22
Degree Grantor中国科学院大学
Place of Conferral中国科学院自动化研究所
Degree Name工学博士
Degree Discipline控制理论与控制工程
Keyword深度强化学习 智能驾驶 视觉控制 目标检测 图注意力网络
Abstract

智能驾驶技术可以将人类驾驶员从复杂单调的驾驶任务中解放出来,由于其智能性和高效性,被认为是引领新一代智能交通系统的革命技术。现有的智能驾驶感知与控制方案在借助摄像机传感器的同时,还依赖于激光雷达和毫米波雷达等传感器,采用人工设计的驾驶规则来完成控制。但由于激光雷达和毫米波雷达昂贵的价格以及传感器本身的局限性,延缓了智能驾驶的大规模商用。此外基于规则的控制方案在系统的自适应性和智能性方面也有所不足。基于视觉的控制方案在实现车辆智能化控制的同时减少了对昂贵传感器的依赖,现已成为智能驾驶领域最新的研究热点。然而,如何高效准确地从图像数据中感知周围的交通环境,设计出数据高效利用的智能驾驶控制策略仍存在着许多困难与挑战。

本文在综述当前研究现状的基础上,针对智能驾驶视觉控制问题,围绕深度学习和强化学习方法展开深入研究。首先聚焦于车辆前方远距离的交通标志识别问题和近距离的关键道路特征提取问题,随后根据视觉感知结果基于强化学习方法研究车辆的横向控制和换道决策等问题。此外,针对强化学习控制策略收敛缓慢的问题,分别基于高斯过程和图神经网络理论提出了数据高效利用的深度强化学习方法,加快了算法的收敛速度,提高了算法的控制性能。论文的主要章节包含以下工作和贡献:

  1. 围绕视觉输入的环境感知问题,提出了一种基于深度学习的多阶段视频流交通标志识别方法。针对交通标志尺寸小、样本相关性强、类间样本数量分布不均匀等难点,采用交通标志检测与精细分类相分离的多阶段识别方法,有效改善了过拟合问题。在识别结果的基础上挖掘视频流的上下文时序特征,提出了一种交通标志追踪方法,提高了识别的准确率与召回率。最后在交通标志识别数据集上验证了方法的有效性。

  2. 针对视觉输入的车辆横向控制问题,提出了一种基于多任务的强化学习视觉控制方法。考虑多个道路特征之间的关系及其与横向控制之间的关系,采用多任务学习卷积神经网络提取多个相关道路特征之间的共享特征,以提高关键道路特征预测的准确率。针对强化学习横向控制问题,根据道路几何结构特征设计奖赏函数,基于确定策略梯度方法成功实现了视觉输入的车道保持控制,在仿真环境中对比现有的感知与控制方法,验证了所提方法的性能。

  3. 针对无模型强化学习方法控制策略收敛缓慢的问题,提出了一种块数据输入的受扰高斯过程建模强化学习方法。通过在局部时间内逼近强化学习环境的状态转移函数和奖赏函数模型,并结合当前控制策略在状态动作空间生成虚拟探索样本,将其与智能体真实交互样本一同用于控制策略更新,从而加速策略收敛。此外,在高斯过程建模阶段,改进了传统的单样本输入高斯过程方法,使其在接受最小批输入数据的同时,能够有效防止因高斯过程不确定性降低而无法追踪系统模型变化的问题。最后在仿真环境中验证了模型逼近的性能和强化学习收敛速度的提升。

  4. 针对在现有强化学习方法中由于智能体被动地接受环境状态输入这一反应式学习机制而导致的策略收敛缓慢问题,提出了一种基于图注意力的深度强化学习视觉控制方法。利用环境探索的先验知识,通过深度学习构造环境拓扑图,并在此基础上提出了一种递归式图注意力特征提取方法,从拓扑图中聚合多节点特征来作为辅助先验特征,用以提升控制策略的收敛速度。最后通过实验对比了所提方法和现有的视觉控制方法,验证了所提方法的有效性。

  5. 针对两种常见的智能驾驶场景,对于高速公路换道决策问题,提出了一种基于强化学习的换道决策方法。设计了一种同时考虑换道必要性和舒适性的奖赏函数,实现了换道时机选择和超车功能。对于城市结构化道路的交通灯路口场景,提出了一种基于视觉的路口启停速度控制方法,实现了红灯停车绿灯通行的功能。最后通过实验分别验证了两种方法的有效性。

Other Abstract

Due to the intelligence and high efficiency, the autonomous driving can free the human driver from the complex and tedious driving work and is thought to be the next revolution of the intelligent transportation system. The current autonomous driving perception methods not only depend on the camera but also the LiDAR and millimeter-wave radar. The rule-based control strategy is then employed in the control module. Due to the high cost of the radar sensors, it is hard to realize large-scale commercial application. Additionally, the rule-based control methods have limitations on adaptability and intelligence. In contrast, the vision-based control methods only depend on the low-cost onboard camera and have attracted great research attention. However, how to accurately perceive the environment from the visual input and design the intelligent control policy still remain many difficulties and challenges. 

In this thesis, we first review the current research status and related works, then conduct our autonomous driving visual control policy based on deep learning and reinforcement learning methods. In details, firstly, we focus on the long-distance range traffic sign recognition problem and the short-distance range key track feature extraction problem. Then, the reinforcement learning methods with the visual feature input are proposed to tackle the vehicle lateral control and the lane change decision-making problems. Moreover, for the low data efficiency in the model-free deep reinforcement learning methods, we propose two novel algorithms to accelerate the learning process and improve the data efficiency, i.e. the Gaussian process based and the graph attention network based deep reinforcement learning methods. The main contributions of this thesis are as follows:

  1. For the visual traffic scene perception task, we propose a multi-stage video-inputted traffic sign recognition algorithm based on the deep neural networks. For the three main challenges of the traffic sign recognition task, i.e. the small size of the target, the strong correlation among video frames, and the imbalance of the sample distribution among different traffic signs, we design a multi-stage method which includes the traffic sign detection stage and the fine-grid classification stage to tackle the over-fitting issues. Moreover, we exert the temporal features among consecutive frames and propose a traffic sign tacking algorithm to enhance the precision and recall metrics of the above recognition results. Finally, we validate the effectiveness of the proposed method on the public traffic sign detection dataset.

  2. For the vision-based vehicle lateral control problem, we propose a multi-task learning and reinforcement learning based visual control method. Firstly, by concerning the relation of multiple track feature prediction tasks, a multi-task learning convolutional neural network is designed to extract the essential track features. Then based on the extracted track feature and the host car properties, we design a track geometry-based reward function to drive the reinforcement learning process, and successfully learn the lane-keeping policy by employing the deterministic policy gradient algorithm. Finally, we compare the performance of the proposed method and several baselines on TORCS simulator to validate the effectiveness of the proposed method.

  3. For the low data efficiency problem of the model-free reinforcement learning, we propose a perturbed Gaussian process modelling method with chunk data input to accelerate the reinforcement learning training process. By modelling the state transition function and reward function in a small local time interval, the agent can explore in the state-action space and generate enormous imaginary samples by taking the current control policy. Then both the real interaction samples and imaginary samples are used to train the policy and accelerate its convergence. Additionally, in the Gaussian process modelling stage, we improve the current one-point input Gaussian process method to make it accept the mini-batch samples and tackle the uncertainty vanishing problem. Finally, we validate the proposed method from the aspects of the modelling accuracy and the agent training acceleration.

  4. For the reactive policy architecture of the current reinforcement learning methods, which leads to low data efficiency and long training period, we propose a graph attention based method for deep reinforcement learning visual control problem. First, the exploration samples are used to build the topological graph in a supervised manner. Then a recursive graph attention method is proposed to extract the attention features for the deep reinforcement learning agent to speed up the learning process. Finally, we validate the effectiveness of the proposed method in a visual control environment.

  5. For the two common autonomous driving scenarios, i.e. the highway driving scenario and the city intersection scenario, we propose a reinforcement learning based lane-changing decision-making method for the first scenario and vision-based stop-and-go speed control for the second scenario. In the first case, besides the lane-changing requirement, we also concern the comfortable factor in the reward design to avoid frequently lane-changing actions and realize the lane-changing time selection and overtaking functions. For the second case, we realize the function of stopping at the red traffic light and driving at the green traffic light. Finally, we validate the effectiveness of the proposed methods in two simulation environments.

Pages158
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/23944
Collection复杂系统管理与控制国家重点实验室_深度强化学习
Recommended Citation
GB/T 7714
李栋. 面向智能驾驶视觉控制的深度强化学习方法[D]. 中国科学院自动化研究所. 中国科学院大学,2019.
Files in This Item:
File Name/Size DocType Version Access License
自动化所李栋博士学位论文.pdf(6681KB)学位论文 开放获取CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[李栋]'s Articles
Baidu academic
Similar articles in Baidu academic
[李栋]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[李栋]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.