CASIA OpenIR  > 毕业生  > 硕士学位论文
基于深度学习的室内场景语义建图与超分辨率技术研究
陈睿进
Subtype硕士
Thesis Advisor高伟
2020-05-26
Degree Grantor中国科学院大学
Place of Conferral中国科学院自动化研究所
Degree Discipline模式识别与智能系统
Keyword深度学习 语义建图 深度图 超分辨率 残差网络
Abstract

随着机器人在生产生活中有了越来越广泛的应用,人们对机器人和环境的交互能力提出了更高的要求,希望它可以理解并执行人类的自然语言指令,能够实现类似于取放物品、回答环境信息等人机交互任务。为了达成这一目标,机器人需要理解所处环境的三维场景语义信息,即在完成对环境三维重建的同时识别出模型中物体的语义类别,从而构建出一个可供机器人查询检索的包含语义信息的地图。语义地图的构建离不开深度相机提供的室内深度图,因此作为三维重建技术基础的深度信息的精确获取变得非常重要。目前深度图可以由低价的深度相机便捷地获取,然而在这种硬件配置条件下获得的深度图通常分辨率较低。低分辨率深度图通过超分辨率处理技术获得高分辨率深度图,从而可以进一步提高三维重建的精度。本论文采用深度学习方法对机器人室内语义建图和深度图超分辨率问题进行探索和研究,主要工作包括:

1. 针对机器人室内语义建图的实时性和准确性问题,提出了基于三维点云深度神经网络与实时三维重建系统相结合的室内语义地图构建方法,通过使用PointNet++ 对ElasticFusion 在实时三维重建过程中根据RGB-D 图像生成的点云做语义分割,并使用贝叶斯更新方法根据相机位姿计算点云位置更新室内场景全局语义地图,实现了对室内场景的语义建图,突破了基于图像语义分割方法的传统语义建图形式,实现了三维场景点云构建与语义分割同步生成,在几何结构较为明显的语义类上达到了60%~70% 的像素级分类正确率,在某些语义类上相较于传统方法将像素级分类正确率提高了约5% 以上。

2. 针对深度图超分辨率的细节模糊问题,提出了一种基于双分支残差网络的深度图超分辨率重建技术,通过平行的残差块、组、层的嵌套结构对彩色图像和深度图进行多尺度的通道特征提取、交互和上采样,实现了端到端地生成高分辨率深度图,突破了高分辨率彩色图像指导低分辨率深度图上采样时通道特征融合简单且向深度图引入伪影的问题。在数据集Middlebury 上的测试结果表明本文提出的方法相较于传统方法在各个采样因子下平均均方根误差减少约20%。

Other Abstract

With more and more extensive applications of the robot in production and life, people have higher requirements for the interaction ability between robot and environment. It is hoped that the robot can understand and execute the natural language instructions from human beings, and can realize human-computer interaction tasks such as taking and putting objects, answering environmental information, etc. To achieve these goals, the robot needs to understand the three-dimensional (3D) scene information of its environment, which means marking the categories of objects in the model while completing the 3D reconstruction of the environment, so as to build a map containing semantic information for the robot to query and retrieve. The construction of semantic map is inseparable from the indoor depth map provided by the depth camera, so the accurate acquisition of depth information as the basis of 3D technology becomes very important. At present, the depth map can be obtained easily and cheaply by the low-cost depth camera. However, the resolution of the depth map obtained under this hardware condition is usually low. The low-resolution depth map needs to be transformed into high-resolution depth map through super-resolution processing before it can be used in 3D technology. In this paper, the deep learning methods are employed to explore and study robotic indoor semantic mapping and depth map super resolution. Specifically, the main works of this paper include:

1. Aiming at the problems of real-time and accuracy of robotic indoor semantic mapping, a method of indoor semantic mapping based on the combination of the deep neural network of 3D point cloud and real-time 3D reconstruction system is proposed. This method uses PointNet++ to semantically segment the point cloud generated by the RGB-D image during the real-time 3D reconstruction process of ElasticFusion, and uses the Bayesian updating method to calculate the position of the point cloud by the camera pose and update the global semantic map of the indoor scene. This method realizes the semantic mapping of the indoor scene, breaks through the traditional semantic mapping form based on the image semantic segmentation method, realizes the synchronous generation of 3D scene point cloud and semantic segmentation, achieves the pixel-level classification accuracy of 60%~70% on some semantic classes with obvious geometric structures, and improves the pixel-level classification accuracy by more than 5% on several semantic classes compared with the traditional methods.

2. Aiming at the problem of the blur of details in depth map super resolution, a novel depth map super-resolution reconstruction technology based on dual-branch residual network is proposed. This technology realizes the generation of the high-resolution depth map end to end through the multi-scale feature extraction, interaction and upsampling of depth map and color image with the parallel nested structure of residual blocks, groups and levels, and breaks through the problem that channel-wise feature fusion is simple and the artifacts are introduced to the depth map when the high-resolution color image guides the upsampling of the low-resolution depth map. The verification on dataset Middlebury shows that the average root mean square error of this technique is reduced by about 20% compared with the traditional methods under each sampling factor.

Pages58
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/39113
Collection毕业生_硕士学位论文
Recommended Citation
GB/T 7714
陈睿进. 基于深度学习的室内场景语义建图与超分辨率技术研究[D]. 中国科学院自动化研究所. 中国科学院大学,2020.
Files in This Item:
File Name/Size DocType Version Access License
基于深度学习的室内场景语义建图与超分辨率(5489KB)学位论文 限制开放CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[陈睿进]'s Articles
Baidu academic
Similar articles in Baidu academic
[陈睿进]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[陈睿进]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.