多视几何与数据驱动相结合的深度估计算法研究

CASIA OpenIR > 毕业生 > 硕士学位论文

	多视几何与数据驱动相结合的深度估计算法研究
	陈天
	2019-06-13
页数	92
学位类型	硕士
中文摘要	随着视觉感知和深度学习技术的发展，计算机三维视觉在增强现实、自动驾驶等领域的应用取得了长足的进步。其中，基于视觉的立体信息感知是其中的重要基础性通用技术模块之一。本文面向不同场景和基于不同的图像数据获取方式，以获得稠密、精确的场景深度为立足点，开展了多视几何与数据驱动相结合的深度估计算法研究。本文分别概述了基于多视几何理论的深度估计算法和基于数据驱动的深度估计算法研究现状，并分别介绍了这两类深度估计算法的优缺点，介绍了不同输入图像数据类型的深度估计方法和原理，并对研究背景和意义做了介绍。本文针对数据驱动的深度估计方法存在的固有缺陷，结合视觉几何方法，做了以下研究。首先，设计了 RGB-D 数据驱动的单目深度估计算法框架，将深度梯度图和深度图进行联合优化，梯度损失在边缘区域响应较大，使得模型专注于图像中物体的边缘区域。为了进一步提升边缘区域的预测精度，本文在预训练阶段使用数据归一化方法及边缘区域加权处理方法。同时在模型中引入注意力模块，使得网络在每层的局部感受域内融合空间和通道信息来构建信息特征，提升网络提取的特征指向性，进而提高模型的精确度。其次，设计了双目数据驱动的单目深度估计算法模型框架，根据双目视差重建理论设计无监督训练模式，避免使用深度真值进行模型训练。使用逆深度进行三维深度的回归预测，提高室外场景的预测精度。提出了局部平面优化算法，基于深度信息得到空间法向量，在优化过程中将像素点的法向量差异作为平面约束条件，提高大平面区域像素的深度预测精度。最后，针对单目深度估计模型在场景深度预测中存在的尺度不定性问题，设计了基于多视几何的预测深度和数据驱动预测深度的线性融合模型。多视图存在的对极几何约束可以较为准确地估计共同观测到的特征点的深度，利用这些稀疏特征点可以对不同帧的网络输出深度图进行尺度校正。同时，由于图像中的平面结构不受尺度影响，使用网络输出的深度图进行几何转化可以大致估计中图像场景中的平面信息，利用这些平面信息对几何深度进行优化。
英文摘要	With the development of visual perception and deep learning technology, great progress has been made in 3D computer vision for augmented reality or automatic driv- ing etc. In 3D computer vision, visual based stereo information perception is one of the most important basic general technical modules. This paper is aiming to obtain dense and accurate scene depth in different scenarios. Based on different image data acquisi- tion methods, the depth estimation algorithms combining multi-view geometry and data driving are studied. This paper reviews the state of the art of depth estimation algorithms based on multi-view geometry theory and data-driven methods, and introduces the advantages and disadvantages of these two types of depth estimation algorithms, and also introduces the principles of depth estimation methods when handling different types of image data. In this paper, the following researchs are done to overcome the inherent defects of the data-driven depth estimation methods, by combining visual geometry algorithms. Firstly, a novel RGBD data-driven monocular depth estimation framework is in- troduced. In the framework, depth gradient map and depth map are jointly optimized. The gradient loss has large impact in the edge region, which makes the model focus on the edge region of the image. In order to improve the prediction accuracy of the edge region further, this paper uses normalization and edge weighting in the pretrain- ing stage. At the same time, the attention mechanism is used in the model, which fuse the space and channel information into the network in which information features are constructed in the local receptive domain of each layer. In this way, the feature extrac- tion directionality is improved in the network, and finally improves the accuracy of the model. Secondly, a novel binocular data-driven monocular depth estimation framework is introduced. The unsupervised training paradigm is designed according to the binocular disparity reconstruction, which makes the depth ground truth free to carry out model training. In addition, the inverse depth is used to perform 3D depth regression predic- tion to improve the accuracy of outdoor scenes. A local plane optimization mothod is proposed. The space normal vector is computed by depth information. In the optimization process, the normal vector difference of the pixel points is used as the plane constraint condition to improve the depth prediction accuracy of the pixels in the large plane region. Finally, a fusion model to cooperate the multi-view geometry depth estimation with data-driven depth prediction is introduced to reduce the scale uncertainty in learning based monocular depth prediction. In the fusion model, the epipolar geometry constraint of multi-view can estimate accurate depth, while to realize real time, only depth of sparse feature points can be estimated. By using these depth of sparse feature points, the scale of network predicted depth maps can be estimated. In the other hand, 3D shape information from the image is independent to the scale of depth, using the depth map, the planes in the scene can be extracted and the geometric depth can be optimized by using the plane and corresponding homography.
关键词	环境感知计算机三维视觉视觉感知深度估计
学科领域	计算机感知
学科门类	工学 ; 工学::控制科学与工程
语种	中文
资助项目	National Natural Science Foundation of China[61503376] ; National Natural Science Foundation of China[NSFC 61633020] ; National Natural Science Foundation of China[61503376] ; National Natural Science Foundation of China[NSFC 61633020]
七大方向——子方向分类	三维视觉
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/23851
专题	毕业生_硕士学位论文
推荐引用方式 GB/T 7714	陈天. 多视几何与数据驱动相结合的深度估计算法研究[D]. 中国科学院自动化研究所. 中国科学院大学,2019.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
6.3纸质版.pdf（5263KB）	学位论文		限制开放	CC BY-NC-SA