基于深度学习的视觉里程计与视觉定位技术研究

	基于深度学习的视觉里程计与视觉定位技术研究
	万一鸣
	2020-05-26
页数	80
学位类型	硕士
中文摘要	相机位姿估计是移动机器人、自主导航以及增强现实中的重要环节。位姿估计通常可以分为两种：绝对位姿估计和相对位姿估计。给定一张图片，绝对位姿估计计算全局坐标系下的相机拍摄位姿，通常称之为视觉定位；相对位姿估计则计算帧间的相对位姿，通常称为视觉里程计。近年来，深度学习发展迅速，广泛应用于人脸识别、目标跟踪等领域，将深度学习技术应用于视觉位姿估计也得到越来越多的关注。本文针对基于深度学习的单目视觉里程计和视觉定位技术进行研究，主要创新成果如下：在单目视觉里程计方面，针对现有网络泛化性能差的问题，提出了一种基于多任务学习的视觉里程计模型。该模型在回归相对位姿的同时，将光流预测作为辅助任务。这种多任务学习的方式，能够使网络挖掘到任务间的内在相关性，学习到更好的运动特征，从而避免过拟合的风险。实验表明，本文提出的方法能够有效地提高网络的泛化能力。针对视觉里程计容易受到场景中动态物体影响的问题，提出了一种基于对极约束的动态物体感知的视觉里程计模型。该模型通过对极约束估计动态物体的掩膜，减小该部分区域光度误差信号的权重，从而削弱其对梯度更新的影响。为了解决循环神经网络的输出过于平滑的问题，该模型还通过提出的LCGR（Local Convolution and Global RNN）模块来强化图像序列的局部信息并统筹全局信息。实验表明，本文提出的方法能够有效地提高相对位姿估计的精度，并且使得网络在含有大量运动物体的场景中拥有更强的鲁棒性。在视觉定位方面，针对稀疏训练数据容易导致网络过拟合的问题，提出了一种基于在线几何数据增广策略的端到端视觉定位方法。该方法首先通过半监督的方式估计图像深度，然后随机合成新视角下的图像，从而实现了训练数据的增广。此外，本文提出了一种几何一致性损失函数同时优化绝对和相对位姿。实验表明，本文提出的增广策略能够使得网络学习到更加通用、有意义的视觉特征。相比传统的随机剪裁增广策略，本文的方法在位置和旋转方面的中值误差分别相对降低了77.1% 和66.0%。
英文摘要	Camera pose estimation is essential for robotics, auto-driving and augment reality.Pose estimation can be generally divided into two kinds: absolute pose estimationand relative pose estimation. Given an RGB image, absolute pose estimation is to calculate camera poses under the global coordinate system. This is often called visual localization. Relative pose estimation is about estimating the pose between two consecutive frames. This is usually called visual odometry. In recent years, deep learning has underwent tremendous development and been widely applied to computer vision field, such as face recognition and target tracking. More and more attention is paid to deep-learning-based pose estimation. This paper focuses on deep-learning-based visual odometry and localization, the main contributions are as follows: For visual odometry, aiming at the poor performance of the generalization of current deep neural networks, a novel odometry model based on multi-task learning is proposed. This model learns to estimate relative poses with optical flow prediction as the auxiliary task. Learning the two tasks simultaneously can force the network to explore the inner-relationship between the two tasks and help the network learn better motion features. The risk of over-fitting is therefore alleviated. Experiment results indicate the proposed method can effectively improve the ability of generalization. Visual odometry is easily influenced by dynamic objects in the scene. To solve this problem, a novel odometry model which can perceive dynamic objects is proposed. This model estimates masks of dynamic objects via epipolar constraint and reduces weights of photometric error for such areas. In order to solve the problem that the output of the recurrent neural network is too smooth, a module named LCGR(Local Convolution and Global RNN) is proposed to enhance local information of image sequence and capture global information. Experiments demonstrate the proposed method can improve the accuracy of relative pose estimation and make the network more robust to scenes which contain a lot of dynamic objects. For visual localization, the network is easily over-fitting because of the sparsity of training data. To solve the problem, a geometric data augmentation method is proposed. The proposed method first predicts the depth maps for input images in a semi-supervised way and randomly synthesizes new views using the predicted depth maps. The training data is therefore turning rich. In addition, a novel geometry-consistency loss function is proposed to optimize absolute and relative poses simultaneously. Experiments indicate this method can help the network learn more general and meaningful visual features. Compared with the traditional random cropping strategy, the proposed method reduces the median error by 77.1% for position and 66.0% for orientation.
关键词	请输入关键词
语种	中文
七大方向——子方向分类	三维视觉
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/39139
专题	多模态人工智能系统全国重点实验室_机器人视觉
推荐引用方式 GB/T 7714	万一鸣. 基于深度学习的视觉里程计与视觉定位技术研究[D]. 中国科学院自动化研究所. 中国科学院大学,2020.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
基于深度学习的视觉里程计与视觉定位技术研（11938KB）	学位论文		开放获取	CC BY-NC-SA