CASIA OpenIR  > 综合信息系统研究中心  > 视知觉融合及其应用
Thesis Advisor杨一平 ; 蔡莹皓
Degree Grantor中国科学院自动化研究所
Place of Conferral中国科学院自动化研究所
Degree Discipline计算机应用技术
Keyword局部特征描述子 单目图像深度估计 相机位姿估计 八叉树地图 语义地图

近年来,同时定位与建图(Simultaneous Localization and Mapping,SLAM)作为移动机器人实现自主导航的关键技术受到了人们的广泛关注。单目SLAM 因其结构简单、成本低、灵活性和拓展性强等方面的优势成为了视觉 SLAM 的主要研究热点。然而,单目SLAM通常只能构建稀疏或半稠密的场景地图,无法提供稠密地图支持机器人导航和避障等应用。本文围绕基于深度估计的同时定位与稠密语义地图构建方法开展相关研究工作,论文的主要工作和创新点归纳如下:
三、为了进一步提升深度估计与相机位姿估计的准确率,提出一种结合传统几何与自监督深度学习的深度估计与位姿计算的自迭代优化方法。一方面,通过神经网络预测的深度图像执行伪RGB-D SLAM,依据伪深度信息和SLAM鲁棒的优化算法及回环检测可以得到优于单目SLAM的可靠的相机位姿和稀疏地图点。另一方面,通过单目SLAM创建的稀疏地图点引导图像深度估计,提升深度估计的质量。在TUM RGB-D和KITTI数据集上的实验结果表明了所提方法的有效性。

Other Abstract

Simultaneous Localization and Mapping (SLAM) as the key component for autonomous navigation of intelligent mobile robots, has attracted great attention in recent years. Among various SLAM types, monocular SLAM has become a popular topic in visual SLAM due to its advantages of simple structure, low cost, flexibility, strong scalability, etc. However, monocular SLAM is only able to build sparse or semi-dense maps of the environment, which cannot be used to support applications such as robot navigation and obstacle avoidance. This thesis focuses on simultaneous localization and dense semantic mapping based on depth estimation. The main contents of the thesis are summarized as follows:
1. Since it is difficult to establish robust feature matching under challenging conditions such as large viewpoint variations and severe lighting changes, we propose a local feature descriptor that combines 2D image and 3D geometric information together. The non-linear feature fusion fully captures the complementary information between two features. Experimental results on keypoint matching and pairwise registration tasks show that the proposed local feature descriptor performs much better than other feature descriptors with a single modality and the direct fusion method by concatenating different features together.
2. Current monocular depth estimation methods are difficult to generate depth images with clear and sharp details, we combine a global self-attention mechanism and dynamic guided upsampling to learn monocular depth estimation. On one hand, the self-attention mechanism is able to capture long-range dependencies by computing the representation of each image location by a weighted sum of features at all image locations. On the other hand, a dynamic guided upsampling module is designed to employ a dynamically generated kernel conditioned on low-level features to guide the upsampling of the coarse depth map. Experimental results show that the proposed method is able to generate visually pleasant and highly-accurate depth maps on indoor dataset NYU and outdoor dataset KITTI.
3. To further improve the accuracy of depth prediction and camera pose estimation for monocular videos, we propose a method to iteratively update the predicted depths and camera pose by combining the respective advantages of self-supervised monocular depth estimation and monocular SLAM. On one hand, pseudo RGB-D SLAM with CNN-predicted depth is able to achieve reliable camera pose estimation superior to monocular SLAM by incorporating pseudo-depth information, robust optimization algorithm, and loop closure detection. On the other hand, the obtained sparse map from monocular SLAM is able to guide the depth estimation network with improved performance. Experimental results on TUM RGB-D and KITTI datasets demonstrate the effectiveness of the proposed method.
4. We further build a dense semantic map based on predicted depth estimation. The method first obtains 3D object semantic information through 2D image object detection and 3D point cloud segmentation. Next, the association between objects is established according to the ratio of the overlap between object point clouds to build the 3D object semantic map. Simultaneously, a dense octomap of the environment is built based on the dense point cloud and camera pose. Experimental results show that depth estimation can assist monocular SLAM to build dense octomap and 3D object semantic maps to support applications such as robot navigation and obstacle avoidance.

Document Type学位论文
Recommended Citation
GB/T 7714
邢晓霞. 基于深度信息的同时定位与稠密建图方法研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2022.
Files in This Item:
File Name/Size DocType Version Access License
CASIA Thesis -邢晓霞.pd(7942KB)学位论文 开放获取CC BY-NC-SA
Related Services
Recommend this item
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[邢晓霞]'s Articles
Baidu academic
Similar articles in Baidu academic
[邢晓霞]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[邢晓霞]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.