Institutional Repository of Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
|Place of Conferral||中国科学院自动化研究所|
Camera pose estimation is essential for robotics, auto-driving and augment reality.Pose estimation can be generally divided into two kinds: absolute pose estimationand relative pose estimation. Given an RGB image, absolute pose estimation is to calculate camera poses under the global coordinate system. This is often called visual localization. Relative pose estimation is about estimating the pose between two consecutive
For visual odometry, aiming at the poor performance of the generalization of current deep neural networks, a novel odometry model based on multi-task learning is proposed. This model learns to estimate relative poses with optical flow prediction as the auxiliary task. Learning the two tasks simultaneously can force the network to explore the inner-relationship between the two tasks and help the network learn better motion features. The risk of over-fitting is therefore alleviated. Experiment results indicate the proposed method can effectively improve the ability of generalization.
Visual odometry is easily influenced by dynamic objects in the scene. To solve this problem, a novel odometry model which can perceive dynamic objects is proposed. This model estimates masks of dynamic objects via epipolar constraint and reduces weights of photometric error for such areas. In order to solve the problem that the output of the recurrent neural network is too smooth, a module named LCGR(Local Convolution and Global RNN) is proposed to enhance local information of image sequence and capture global information. Experiments demonstrate the proposed method can improve the accuracy of relative pose estimation and make the network more robust to scenes which contain a lot of dynamic objects.
For visual localization, the network is easily over-fitting because of the sparsity of training data. To solve the problem, a geometric data augmentation method is proposed. The proposed method first predicts the depth maps for input images in a semi-supervised way and randomly synthesizes new views using the predicted depth maps. The training
|万一鸣. 基于深度学习的视觉里程计与视觉定位技术研究[D]. 中国科学院自动化研究所. 中国科学院大学,2020.|
|Files in This Item:|
|基于深度学习的视觉里程计与视觉定位技术研（11938KB）||学位论文||开放获取||CC BY-NC-SA||Application Full Text|
|Recommend this item|
|Export to Endnote|
|Similar articles in Google Scholar|
|Similar articles in Baidu academic|
|Similar articles in Bing Scholar|
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.