面向未知环境自主探索的深度强化学习方法
李浩然
2020-08
页数152
学位类型博士
中文摘要

目前智能机器人在人们日常生产和生活中得到广泛应用,逐渐将人类从繁重的劳动中解放出来。移动机器人作为智能机器人的典型代表,在家庭服务、公共服务以及物流运输等方面有着广泛的应用前景和重要的现实意义。在现有移动机器人系统中,尤其是室外环境下的机器人系统中,通常以高精度地图为基础,采用人工设计的规则,来完成移动机器人既定路线的导航功能。由于高精度地图的制作成本和覆盖范围的局限性,限制了移动机器人的可行驶范围。同时,基于规则的决策方法使其难以面对复杂环境下带来的挑战。基于深度强化学习的未知环境自主探索成为移动机器人领域近几年的研究热点。然而,目前大部分的研究工作主要集中在室内小规模的仿真环境中,而对于实现室外交通环境下高效安全的自主探索仍然面临着众多困难和挑战。

本文以构建大规模未知交通环境下的自主探索系统为目标,将交通环境下的自主探索系统构建任务分解为道路感知和探索决策两个子任务,围绕深度学习和深度强化学习方法展开研究。在道路感知模块中,首先针对深度分割网络在车道线检测中面临的感受野问题,提出了信息聚合网络,增强特征之间的联系;然后针对多传感器融合道路分割方法中的置信度分配问题,提出了基于场景的双向特征融合网络。
在探索决策模块中,本文从室内环境下的探索问题出发,提出了基于全卷积Q网络的深度强化学习探索算法,加快算法收敛速度,提高算法的迁移性能。最后结合道路感知的结果,针对深度强化学习在大规模多分支交通环境下面临的困难探索问题,提出了场景联想机制,提高了算法的探索效率。

论文主要章节包含以下工作和贡献:

1. 围绕道路标志检测中的车道线分割问题,提出了一种信息聚合分割网络。针对传统分割网络在车道线分割任务中感受野不足的问题,考虑到车道线本身具有的特定线型形状特征和分布间隔均匀特性,提出了信息聚合机制,分别从特征图的水平和垂直两个方向上设计了基于线性加权和注意力机制的聚合方法,增强像素点之间的空间联系。该方法在车道线公开数据集以及轨道检测数据集中取得非常有竞争力的成绩。

2. 针对道路分割任务中多传感器融合问题,提出了一种双向特征融合网络,实现基于激光雷达和相机融合的道路分割,以克服单一传感器面临的局限性。针对基于图像空间融合时的空间畸变问题和传感器特征融合时的置信度分配问题,提出了图像空间和激光雷达俯视投影空间下特征空间转换模块,并设计了一种基于场景的融合算法解决不同传感器之间置信度分配问题。最后在道路分割数据集中验证了算法的有效性。

3. 针对室内未知环境下机器人自主探索问题,提出了一种基于深度强化学习的未知环境探索算法。首先,针对已有基于深度强化学习控制算法学习效率低、迁移性差的问题,构建了移动机器人自主探索框架,将探索问题划分为建图、决策和规划三个子模块。针对决策模块,提出了多任务的全卷积Q网络,加速深度网络训练过程。此外,设计了一种基于地图熵值变化的奖励函数,驱动算法以实现对环境的探索能力。通过在仿真环境中的自主导航探索实验以及实际环境下的实体机器人实验,验证了所提算法的性能。

4. 基于道路感知结果和室内环境下的自主探索算法,针对大规模交通环境下自主探索问题,构建了一套以深度强化学习算法为核心的自主探索系统。 针对深度强化学习算法在大规模多分支室外交通环境中自主探索面临的学习效率低、导航性能不稳定、探索效率低的问题,首先设计了一种新的中层动作空间,并提出了场景联想机制,以解决大规模多分支环境下的困难探索问题。此外,提出了语义增广下的点云匹配算法以及基于贝叶斯滤波的语义更新算法,以实现在探索过程中道路语义地图的构建。最后,在交通环境下的探索实验中验证了算法性能。

英文摘要

At present, intelligent robots are widely used in people's daily production and life, and gradually liberate humans from heavy labor. As a typical representative of intelligent robot, the mobile robot has a wide range of application prospects and important practical significance in-home service, public service and logistics transportation. In the existing mobile robot system, especially in the outdoor environment, the high-definition map(HD map) and the manual design rules are used to complete the navigation of the mobile robot with the given route.

However, the production cost and coverage limitations of the HD maps limit the travelable area of the mobile robot. At the same time, the rule-based decision-making approach makes it difficult to face the challenges posed by complex environments. In recent years autonomous exploration based on deep reinforcement learning under the unknown environment has become a research hotspot in the field of mobile robots. However, most of the current research is focused on indoor small-scale simulation environment, There are still many difficulties and challenges to the autonomous exploration of achieving efficiency and safety in outdoor traffic environments.

This thesis aims to build an autonomous exploration system in a large-scale unknown traffic environment, focusing on deep learning and deep reinforcement learning methods. In this thesis, the problem formulation of the autonomous exploration system in the traffic environment is divided into two sub-tasks: road perception and exploration decision. In the road perception module, firstly, the information aggregation network is proposed to enhance the connection between features in response to the receptive field problem faced by deep segmentation networks in lane detection. Then, a scenario-based bidirectional fusion network is proposed for the confidence allocation problem in multi-sensor fusion road segmentation. In the exploration decision module, starting from the exploration problems in the indoor environment, this thesis proposes a deep reinforcement learning exploration algorithm based on the full convolutional Q-network, which can accelerate the convergence speed and improve the transfer performance of the algorithm. Finally, combined with the results of road perception, a scenario imagination mechanism is proposed for the difficult exploration problems faced by deep reinforcement learning in large-scale multi-branch traffic environments to improve the exploration efficiency of the algorithm.

The main chapters of this thesis include the following work and contributions:

1.To solve the problem of lane segmentation in road sign detection, we propose an information aggregation segmentation network. In view of the limitations of receptive field in lane segmentation task of traditional segmentation network, we propose an information aggregation mechanism according to the specific lane shape and uniform distribution between lanes. An aggregation method based on linear weighting and attention mechanism from the horizontal and vertical directions of the feature map is designed to enhance the spatial relationship between pixels. This method has achieved good results in the lane datasets and the rail track detection dataset.

2.To address the problem of multi-sensor fusion in road segmentation, we propose a bidirectional feature fusion network to realize road segmentation based on the fusion of LiDAR and camera to overcome the limitations of a single sensor. Aiming at the problem of spatial distortion in the image space and confidence assignment of sensor fusion, we propose a dense space transformation module between image space and top-view space, and design a scenario-based fusion method to solve the problem of confidence assignment between different sensors. Finally, the effectiveness of the method is validated in the road segmentation dataset.

3.To solve the problem of autonomous exploration in indoor unknown environment, we propose a deep reinforcement learning-based unknown environment exploration algorithm. Firstly, aiming at the problems of low learning efficiency and poor transfer of existing deep reinforcement learning-based control algorithms, we present an autonomous exploration framework for mobile robots, which is divided exploration into three sub-modules: mapping, decision-making, and planning. For the decision-making module, we propose a multi-task full convolutional Q-network to accelerate the deep network training process. In addition, a reward function based on the change of map entropy is designed to drive the algorithm to achieve the ability of exploration. The performance of the algorithm is verified by both autonomous navigation experiments in the simulation and the real robot experiment in the actual environment.

4. Based on the results of road perception and autonomous exploration in the indoor environment, aiming at the autonomous exploration problem in the large-scale traffic environment, we build an autonomous exploration system with the deep reinforcement learning algorithm as the core module. In order to solve the problems of low learning efficiency, unstable navigation performance, and low exploration efficiency of deep reinforcement learning algorithms in the large-scale multi-branch outdoor traffic environments, we first design a novel middle-level action space. And the scenario imagination mechanism is proposed to solve the difficult exploration problem in the large-scale multi-branch environment.

In addition, we propose a point cloud matching algorithm with the semantic label and a semantic label update algorithm based on Bayesian filtering to realize the construction of road semantic maps in the process of exploration. Finally, the performance of the algorithm is verified in the experiment of the traffic environment.

关键词移动机器人 深度强化学习 自主探索 智能驾驶 多传感器融合 深度学习 语义分割
学科领域机器人控制 ; 计算机科学技术 ; 人工智能 ; 计算机感知
学科门类工学 ; 工学::控制科学与工程 ; 工学::计算机科学与技术(可授工学、理学学位)
语种中文
七大方向——子方向分类强化与进化学习
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/40309
专题多模态人工智能系统全国重点实验室_深度强化学习
推荐引用方式
GB/T 7714
李浩然. 面向未知环境自主探索的深度强化学习方法[D]. 中国科学院自动化研究所. 中国科学院大学,2020.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
李浩然-博士论文-打印版.pdf(13496KB)学位论文 开放获取CC BY-NC-SA
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[李浩然]的文章
百度学术
百度学术中相似的文章
[李浩然]的文章
必应学术
必应学术中相似的文章
[李浩然]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。