CASIA OpenIR  > 模式识别国家重点实验室  > 机器人视觉
基于深度学习的视觉里程计与视觉定位技术研究
万一鸣
Subtype硕士
Thesis Advisor高伟
2020-05-26
Degree Grantor中国科学院大学
Place of Conferral中国科学院自动化研究所
Degree Discipline模式识别与智能系统
Keyword请输入关键词
Abstract

      相机位姿估计是移动机器人、自主导航以及增强现实中的重要环节。位姿估
计通常可以分为两种:绝对位姿估计和相对位姿估计。给定一张图片,绝对位姿
估计计算全局坐标系下的相机拍摄位姿,通常称之为视觉定位;相对位姿估计则
计算帧间的相对位姿,通常称为视觉里程计。近年来,深度学习发展迅速,广泛
应用于人脸识别、目标跟踪等领域,将深度学习技术应用于视觉位姿估计也得到
越来越多的关注。本文针对基于深度学习的单目视觉里程计和视觉定位技术进
行研究,主要创新成果如下:
       在单目视觉里程计方面,针对现有网络泛化性能差的问题,提出了一种基于
多任务学习的视觉里程计模型。该模型在回归相对位姿的同时,将光流预测作为
辅助任务。这种多任务学习的方式,能够使网络挖掘到任务间的内在相关性,学
习到更好的运动特征,从而避免过拟合的风险。实验表明,本文提出的方法能够
有效地提高网络的泛化能力。
       针对视觉里程计容易受到场景中动态物体影响的问题,提出了一种基于对极
约束的动态物体感知的视觉里程计模型。该模型通过对极约束估计动态物体的
掩膜,减小该部分区域光度误差信号的权重,从而削弱其对梯度更新的影响。为
了解决循环神经网络的输出过于平滑的问题,该模型还通过提出的LCGR(Local
Convolution and Global RNN)模块来强化图像序列的局部信息并统筹全局信息。实验表明,本文提出的方法能够有效地提高相对位姿估计的精度,并且使得网络在含有大量运动物体的场景中拥有更强的鲁棒性。
       在视觉定位方面,针对稀疏训练数据容易导致网络过拟合的问题,提出了一
种基于在线几何数据增广策略的端到端视觉定位方法。该方法首先通过半监督
的方式估计图像深度,然后随机合成新视角下的图像,从而实现了训练数据的增
广。此外,本文提出了一种几何一致性损失函数同时优化绝对和相对位姿。实验
表明,本文提出的增广策略能够使得网络学习到更加通用、有意义的视觉特征。
相比传统的随机剪裁增广策略,本文的方法在位置和旋转方面的中值误差分别
相对降低了77.1% 和66.0%。

Other Abstract

Camera pose estimation is essential for robotics, auto-driving and augment reality.Pose estimation can be generally divided into two kinds: absolute pose estimationand relative pose estimation. Given an RGB image, absolute pose estimation is to calculate camera poses under the global coordinate system. This is often called visual localization. Relative pose estimation is about estimating the pose between two consecutive
frames. This is usually called visual odometry. In recent years, deep learning has underwent tremendous development and been widely applied to computer vision field, such as face recognition and target tracking. More and more attention is paid to deep-learning-based pose estimation. This paper focuses on deep-learning-based visual odometry and localization, the main contributions are as follows:

For visual odometry, aiming at the poor performance of the generalization of current deep neural networks, a novel odometry model based on multi-task learning is proposed. This model learns to estimate relative poses with optical flow prediction as the auxiliary task. Learning the two tasks simultaneously can force the network to explore the inner-relationship between the two tasks and help the network learn better motion features. The risk of over-fitting is therefore alleviated. Experiment results indicate the proposed method can effectively improve the ability of generalization.

Visual odometry is easily influenced by dynamic objects in the scene. To solve this problem, a novel odometry model which can perceive dynamic objects is proposed. This model estimates masks of dynamic objects via epipolar constraint and reduces weights of photometric error for such areas. In order to solve the problem that the output of the recurrent neural network is too smooth, a module named LCGR(Local Convolution and Global RNN) is proposed to enhance local information of image sequence and capture global information. Experiments demonstrate the proposed method can improve the accuracy of relative pose estimation and make the network more robust to scenes which contain a lot of dynamic objects.

For visual localization, the network is easily over-fitting because of the sparsity of training data. To solve the problem, a geometric data augmentation method is proposed. The proposed method first predicts the depth maps for input images in a semi-supervised way and randomly synthesizes new views using the predicted depth maps. The training
data is therefore turning rich. In addition, a novel geometry-consistency loss function is proposed to optimize absolute and relative poses simultaneously. Experiments indicate this method can help the network learn more general and meaningful visual features. Compared with the traditional random cropping strategy, the proposed method reduces
the median error by 77.1% for position and 66.0% for orientation.

 

Pages80
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/39139
Collection模式识别国家重点实验室_机器人视觉
Recommended Citation
GB/T 7714
万一鸣. 基于深度学习的视觉里程计与视觉定位技术研究[D]. 中国科学院自动化研究所. 中国科学院大学,2020.
Files in This Item:
File Name/Size DocType Version Access License
基于深度学习的视觉里程计与视觉定位技术研(11938KB)学位论文 开放获取CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[万一鸣]'s Articles
Baidu academic
Similar articles in Baidu academic
[万一鸣]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[万一鸣]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.