基于RGB-D图像的室内场景高精度三维重建研究

	基于RGB-D图像的室内场景高精度三维重建研究
	李建伟
	2019-05-28
页数	116
学位类型	博士
中文摘要	基于RGB-D图像的室内场景三维重建算法研究是计算机视觉领域的一项重要任务，其主要目的是通过二维彩色图像与深度信息，估计相机姿态，得到三维场景模型。这项研究在移动机器人自主导航、数字文化遗产保护等领域都有着广泛的应用，而且是虚拟现实（Virtual Reality，VR）与增强现实（Augmented Reality，AR）的关键技术之一。如何基于消费级RGB-D相机得到准确、完整的室内场景三维模型，是三维重建研究中比较有挑战性的问题。本文针对该问题展开了系统性的研究，提出了多种基于RGB-D图像进行室内场景三维重建的方法。本文的主要工作和贡献如下： 1、针对深度图像噪声大以及相机位姿估计中的累积误差问题，提出了基于自适应局部-全局配准的室内大场景高精度三维重建方法。首先，分析深度数据中噪声的分布特点，提出自适应双边滤波算法，让值域高斯核函数的方差随着深度信息自动调节，实现图像中远景区域的保边去噪；然后，基于视觉内容对图像序列进行自动分段，段内做局部配准，段间进行闭环检测和全局优化，有效降低相机位姿估计的累积误差，实现大场景三维重建；最后，提出兴趣区域模型，并结合噪声特点进行加权体数据融合，实现模型几何细节的保持。实验结果表明，本方法提高了基于消费级RGB-D相机进行三维重建的系统鲁棒性和重建精度，在基准数据集Augmented ICL-NUIM上的相机平均定位精度和模型平均精度分别比文献中主流方法的实验结果提高了24.2%和15.9%。 2、为了提高在室内弱纹理区域视觉定位的可靠性并提高重建效率，提出了基于CPU平台的快速、鲁棒的室内场景三维重建方法。首先，提出特征点跟踪与边缘信息跟踪相结合的视觉定位算法，使用深度信息辅助边缘的检测与匹配，有效实现快速、鲁棒的视觉定位；然后提出基于相机运动状态与视觉相似度检测的相机视角选择算法，去除场景扫描中回环与相机运动缓慢造成的冗余数据，并采用多分辨率八叉树结构存储数据，实现高效的体数据融合。实验结果表明，本方法在CPU（Intel Core i7-4790）上的相机跟踪速度约45Hz、体数据融合速度可达81Hz，在基准数据集TUM RGB-D和Augmented ICL-NUIM上的相机平均定位精度比文献中主流方法的实验结果提高了约26.8\%，在弱纹理场景的建模效果好于其他方法，而且耗时最少。 3、为了进一步解决深度图像分辨率低和存在数据缺失的问题，提出了一种基于深度学习框架的深度图像超分辨率与补全方法，用以提高三维重建的质量。首先，训练DlapSRN网络从低分辨率深度图像学习高分辨率深度图像，并基于梯度敏感性检测剔除深度图像中的外点，实现深度图像超分辨率；然后，利用两个VGG-16架构的深度网络从深度图像对应的高分辨率彩色图像中学习表面法向与遮挡边界，并对彩色图像做模糊度度量；最后，利用表面法向、遮挡边界以及模糊度信息对深度图像进行联合优化，有效实现深度图像的数据补全。实验结果表明，在含有合成噪声的Middlebury数据集上用本方法增强过的深度图像平均精度高于文献中常见深度图像增强方法约15.9%，在ICL-NUIM数据集上用本方法增强过的深度图像进行三维重建，比用低分辨率深度图像的定位精度提高约74.1%，并实现了模型质量提升。
英文摘要	The research of indoor scene 3D reconstruction with RGB-D images is an important task in the field of computer vision. Its main purpose is to estimate camera poses through color images and depth informations, and to obtain a 3D scene model, which has wide applications in mobile robot autonomous navigation, digital cultural protection, virtual reality (VR) and augmented reality (AR). How to obtain an accurate and complete 3D model with consumer RGB-D cameras is still a challenge to be solved in indoor scene 3D reconstruction. To this end, a systematic study on this problem is carried out and some indoor scenes 3D reconstruction methods based on RGB-D images are proposed in this thesis. The main contributions of this thesis are as follows: 1. For the problem of severe depth noises and accumulated errors in camera pose estimation, a 3D reconstruction method based on adaptive local-global registration is proposed, which can be used to elaborately reconstruct large indoor scene with a consumer RGB-D camera. Firstly, to realize edge-preserving and denoising for the background regions in depth image, an adaptive bilateral filter algorithm is designed, whose Gaussian kernel function in the range space is adaptively adjusted with the depth value. Then, the depth image sequence is automatically partitioned into fragments of various sizes with the proposed content-based segmentation method. Each fragments are locally fused with ICP registration algorithm. A global loop closure and optimization are performed for all fragments to effectively reduce the accumulated errors of camera localization. Finally, an adaptive weighted volumetric method based on a region of interest model and depth noise character, is proposed and used to fuse the registered data into a global model with sufficient geometrical details. Experimental results demonstrate that our approach increases the robustness and accuracy of 3D reconstruction system based on a consumer RGB-D camera. The accuracies of camera localization and surface reconstruction are 24.2% and 15.9% higher than the results of other methods in the literature. 2. In order to improve the reliability of visual localization in texture-less areas of indoor scenes and to improve the efficiency of 3D reconstruction, we propose a fast and robust CPU-based RGB-D scene reconstruction method. Firstly, to obtain fast and robust visual localization, a visual localization algorithm is designed, which is combined with feature point tracking and edge tracking. Depth informations are used to accelerate the process of edge selection and edge matching. Then, to enhance the efficiency of volumetric integration, an efficient data fusion strategy is designed to select camera views and integrate RGB-D images on multiple scales. The camera view selection algorithm can remove the redundant data caused by loop closure and slow camera motion during scene scanning.Experimental results demonstrate that the average speeds of our method are about 45 Hz for camera localization and 81 Hz for data fusion on a 64-bit CPU (Intel Core i7-4790). The average accuracy of our camera localization method is about 26.8% higher than the results of other methods on TUM RGB-D and Augmented ICL-NUIM datasets. Compared with other methods, our method has better reconstruction performances in texture-less scenes and spends the shortest time. 3. To further solve the problem of low resolution and depth loss for depth images, a depth image super-resolution and completion method based on deep learning framework is proposed to improve the quality of 3D reconstruction. First, a DlapSRN network is trained to learn high-resolution depth image from raw low-resolution depth image. At the same time the outliers in depth image is removed based on gradient sensitivity detection. Then, two deep networks designed on the back-bone of VGG-16 are used to learn surface normals and occlusion boundaries from corresponding high-resolution color images. The blurriness of high-resolution color image also is measured. Finally, the depth image is jointly optimized with surface normals, occlusion boundaries and blurriness informations to realize depth completion. Experimental results demonstrate that our method has better performances both on single depth image enhancement and 3D reconstruction. The average accuracy of depth image enhancement of our method is about 15.9% higher than other methods on Middlebury dataset with synthetic noise. The performance of 3D reconstruction using enhanced depth images is better than the results of using low-resolution depth images on ICL-NUIM dataset. The accuracy of camera localization is increased about 74.1%, and the quality of scene model is improved accordingly.
关键词	三维重建同步定位与建图（slam）计算机视觉图像处理深度学习
语种	中文
七大方向——子方向分类	三维视觉
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/23784
专题	多模态人工智能系统全国重点实验室_机器人视觉
推荐引用方式 GB/T 7714	李建伟. 基于RGB-D图像的室内场景高精度三维重建研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2019.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
Thesis.pdf（22791KB）	学位论文		开放获取	CC BY-NC-SA