The research of indoor scene 3D reconstruction with RGB-D images is an important task in the field of computer vision. Its main purpose is to estimate camera poses through color images and depth informations, and to obtain a 3D scene model, which has wide applications in mobile robot autonomous navigation, digital cultural protection, virtual reality (VR) and augmented reality (AR). How to obtain an accurate and complete 3D model with consumer RGB-D cameras is still a challenge to be solved in indoor scene 3D reconstruction. To this end, a systematic study on this problem is carried out and some indoor scenes 3D reconstruction methods based on RGB-D images are proposed in this thesis.
The main contributions of this thesis are as follows:
1. For the problem of severe depth noises and accumulated errors in camera pose estimation, a 3D reconstruction method based on adaptive local-global registration is proposed, which can be used to elaborately reconstruct large indoor scene with a consumer RGB-D camera. Firstly, to realize edge-preserving and denoising for the background regions in depth image, an adaptive bilateral filter algorithm is designed, whose Gaussian kernel function in the range space is adaptively adjusted with the depth value. Then, the depth image sequence is automatically partitioned into fragments of various sizes with the proposed content-based segmentation method. Each fragments are locally fused with ICP registration algorithm. A global loop closure and optimization are performed for all fragments to effectively reduce the accumulated errors of camera localization. Finally, an adaptive weighted volumetric method based on a region of interest model and depth noise character, is proposed and used to fuse the registered data into a global model with sufficient geometrical details. Experimental results demonstrate that our approach increases the robustness and accuracy of 3D reconstruction system based on a consumer RGB-D camera. The accuracies of camera localization and surface reconstruction are 24.2% and 15.9% higher than the results of other methods in the literature.
2. In order to improve the reliability of visual localization in texture-less areas of indoor scenes and to improve the efficiency of 3D reconstruction, we propose a fast and robust CPU-based RGB-D scene reconstruction method. Firstly, to obtain fast and robust visual localization, a visual localization algorithm is designed, which is combined with feature point tracking and edge tracking. Depth informations are used to accelerate the process of edge selection and edge matching. Then, to enhance the efficiency of volumetric integration, an efficient data fusion strategy is designed to select camera views and integrate RGB-D images on multiple scales. The camera view selection algorithm can remove the redundant data caused by loop closure and slow camera motion during scene scanning.Experimental results demonstrate that the average speeds of our method are about 45 Hz for camera localization and 81 Hz for data fusion on a 64-bit CPU (Intel Core i7-4790). The average accuracy of our camera localization method is about 26.8% higher than the results of other methods on TUM RGB-D and Augmented ICL-NUIM datasets. Compared with other methods, our method has better reconstruction performances in texture-less scenes and spends the shortest time.
3. To further solve the problem of low resolution and depth loss for depth images, a depth image super-resolution and completion method based on deep learning framework is proposed to improve the quality of 3D reconstruction. First, a DlapSRN network is trained to learn high-resolution depth image from raw low-resolution depth image. At the same time the outliers in depth image is removed based on gradient sensitivity detection. Then, two deep networks designed on the back-bone of VGG-16 are used to learn surface normals and occlusion boundaries from corresponding high-resolution color images. The blurriness of high-resolution color image also is measured. Finally, the depth image is jointly optimized with surface normals, occlusion boundaries and blurriness informations to realize depth completion. Experimental results demonstrate that our method has better performances both on single depth image enhancement and 3D reconstruction. The average accuracy of depth image enhancement of our method is about 15.9% higher than other methods on Middlebury dataset with synthetic noise. The performance of 3D reconstruction using enhanced depth images is better than the results of using low-resolution depth images on ICL-NUIM dataset. The accuracy of camera localization is increased about 74.1%, and the quality of scene model is improved accordingly.