CASIA OpenIR  > 毕业生  > 博士学位论文
融合学习特征的室外大场景视觉定位研究
张朋举
2022-05-17
Pages144
Subtype博士
Abstract

视觉定位的任务是根据相关图像信息计算拍摄图像的相机位置和姿态,即相机的位姿。视觉定位是三维计算机视觉领域的一个重要研究方向,在机器人、自动驾驶、人机交互和增强现实等领域有着广泛应用和重要价值。本文主要研究在已知三维模型的情况下基于单幅图像的室外大场景视觉定位。基于几何的视觉定位方法因有明确的理论体系而被广泛应用,但很难适用于具有大视角、天气和光照等变化的室外大场景。近年来,基于学习的方法得到了较多发展,在特征表达上具有较强的鲁棒性。因此,如何将基于学习方法的优势融入到基于几何的视觉定位框架中,从而解决在室外复杂大场景下的视觉定位难题,具有重要理论意义和应用价值。本文针对如何将学习型特征融合到基于几何的视觉定位框架展开了深入研究,主要贡献如下:

1、提出了一种随机森林和学习型局部描述子结合的视觉定位方法。在该方法中,首先提出了一种加权汉明距离损失函数,并设计了同时训练实值和二值描述子的网络(Co-train Real-valued and Binary descriptorCRBNet),以减少深度学习描述子二值化造成的量化损失。三维模型中CRBNet二值描述子之间的监督信息用来训练随机森林。之后,利用随机森林为三维模型中数以百万计的三维点编索引,以便快速查找查询描述子的候选最近邻。CRBNet实值描述子用于在候选最近邻中查找查询描述子的最近邻。此外,提出了一种针对学习型描述子的概率模型来预测随机森林中最有可能包含查询描述子最近邻的叶子结点。实验表明,该方法在定位速度和精度上都取得比较好的结果。

2、针对暗光场景,提出了一种基于小波变换图像增强网络的暗光大场景视觉定位方法。暗光图像与三维模型图像表观风格不一致,为视觉定位带来较大困难。本文利用暗光增强技术将暗光图像增强至正常图像,然后用增强后的图像进行视觉定位。但是,现有暗光增强技术在增强图像时会放大噪声。因此,在该方法中,提出了一种基于小波变换的暗光图像增强网络。在提出的网络中,首先基于Retinex理论和小波变换,将暗光图像在频域内分解为光照图和与光照无关的反射率图,之后分别对光照图和反射率图进行增强和去除退化,最后利用增强的光照图和去除退化的反射率图合成正常光照图像。实验结果表明,所提出的网络能够极大程度地抑制图像增强所带来的噪声,与前述随机森林的视觉定位相融合,提升了暗光场景下的定位效果。

3、提出了一种基于局部和全局描述子的并行搜索视觉定位框架。本质上,基于随机森林的视觉定位依赖局部特征查找查询描述子的候选最近邻,导致其对光照变化比较敏感;基于图像检索的视觉定位依赖全局特征查找查询描述子的候选最近邻,导致其对大视角变化比较敏感。针对这些问题,提出了并行搜索视觉定位框架。在该框架中,同时利用图像的局部特征和全局特征并行搜索查询描述子的候选最近邻,进而得到精准的查询图像位姿。实验结果表明,这种框架能够融合基于图像检索和随机森林的视觉定位方法的优势,对光照和大视角变化都有较好的鲁棒性。

4、针对视觉定位方法中的图像检索和局部特征点匹配,提出了基于半全局形状感知网络(SGSNet)的图像检索方法和基于图像块学习融合多层次信息的特征点匹配方法(PatchMatcher)。在图像检索任务中,已有注意力机制在捕捉长程依赖关系时仅考虑网络中局部特征的相似性,忽略了它们位置距离的影响。因此,提出了一种捕捉长程依赖关系时兼顾特征相似性和位置距离的半全局形状感知网络,并给出一种基于二叉树的具有线性计算复杂度的高效信息聚合方法。此外,已有学习型匹配器利用描述子作为输入,而描述子并非特征点周围最本质的表观信息,其对于匹配任务来讲并非最优。因此,提出一种基于图像块的特征点匹配网络,并且在网络中学习特征点局部、邻接和全局一致性。在图像检索、特征匹配和视觉定位多种数据库上,大量实验验证了这些研究非常有效且能够在困难场景(包含天气、光照和视角等剧烈变化的场景)中大幅提升视觉定位的精度。

Other Abstract

Visual localization is to compute 6 Degree of Freedom (DoF) camera pose according to corresponding image information. Visual localization is an important research interest in the field of three-dimensional (3D) computer vision, which has been widely used in robotics, automatic driving, human-computer interaction, augmented reality and other related fields. This paper mainly focuses on large scale outdoor visual localization based on a single image under the condition that 3D models are available. Geometric based visual localization methods are very popular due to their clear theoretical system, but these methods still encounter challenges when facing large scale outdoor scenes with violent viewpoint, weather and illumination changes. In recent years, benefiting from robust learning features, learning based methods have made great progress. Therefore, how to fuse learning features into geometric based visual localization methods, so as to overcome the difficulty of large scale outdoor visual localization, is a meaningful and challenging topic. We make an in-depth research on how to fuse learning features into geometric based visual localization methods. The main contributions are as follows:

 

1. A visual localization method fusing random trees and learning based local descriptors is proposed. In this method, we first propose a weighted Hamming distance loss function, with which we design a network that Co-trains Real-valued and Binary descriptor (CRBNet) to reduce the quantization loss caused by the binarization of learning based real-valued descriptors. The supervision information between CRBNet binary descriptors in 3D models is used to train random trees. Then the random trees are adopted to index 3D points in 3D models and find candidate nearest neighbors of query descriptors quickly and correctly. CRBNet real-valued descriptors are employed to find nearest neighbors of query descriptors among the candidate nearest neighbors. In addition, we propose a probability model for learning based descriptors to predict the leaf nodes that are most likely to contain the nearest neighbors of query descriptors. Extensive experiments show that this visual localization method achieves good results in terms of both speed and accuracy.

 

2. For low light scenes, a visual localization method with wavelet embedded low light image enhancement network is proposed. The appearance differences between low light images and normal light images bring great difficulties for low light image localization. If we adopt existing low light image enhancement technologies to enhance low light images into normal light images for visual localization, the accuracy is very low because the enhanced images usually contain amplified noise. Thus, we propose a wavelet transform embedded network for low light image enhancement. In the proposed network, firstly, based on Retinex theory and wavelet transform, a given low light image is decomposed into illumination and reflectance in the frequency domain. Then the illumination is enhanced and the degradation on the reflectance is removed respectively. Finally, a corresponding normal light image is synthesized by the enhanced illumination and the degradation-removed reflectance. Experimental results show that the proposed wavelet transform embedded network can greatly suppress noise when enhancing low light images and the performances of above mentioned visual localization method is greatly improved by leveraging the proposed wavelet transform embedded network.

 

3. A parallel search framework for visual localization is proposed. In essence, random trees based visual localization methods rely on local features to find candidate nearest neighbors of query descriptors, causing the localizations sensitive to large illumination changes. While, image retrieval based visual localization methods rely on global features to find candidate nearest neighbors of query descriptors, causing the localizations sensitive to large viewpoint changes. In order to solve these problems, we propose a parallel search framework for visual localization, where we use local and global descriptors in parallel to search candidate nearest neighbors of query descriptors, and then linearly search 2D-3D correspondences in candidate nearest neighbors to obtain accurate 6DoF poses of query images. Experimental results show that this framework can integrate the advantages of image retrieval and random trees based visual localization methods, which is robust for both large illumination and viewpoint changes.

 

4. For image retrieval and keypoint matching, two of the most important components of visual localization, we propose a semi-global shape-aware network for image retrieval and a multilevel consensus network for keypoint matching. In the task of image retrieval, existing attention mechanisms only consider the similarity of local features when capturing long-range dependencies, ignoring the influence of their geometric proximity. Therefore, we propose a semi-global shape-aware network (SGSNet) that takes feature similarity and geometric proximity into account simultaneously when capturing long-range dependencies. And also, an efficient information aggregation method with linear computational complexity based on binary tree is proposed. In addition, existing learning based keypoint matchers use descriptors as input, but descriptors are not the most essential apparent information around keypoints, which is not optimal for matching tasks. Therefore, we propose a keypoint matching network (PatchMatcher) based on patches around keypoints, where the local, adjacent and global consensuses of keypoint features are considered. Extensive experimental results on image retrieval, keypoint matching and visual localization tasks verify that these studies are very effective and can greatly improve the performances of visual localization on the challenging scenes.

Keyword视觉定位,并行搜索,学习特征,图像检索,暗光增强
Subject Area人工智能其他学科
MOST Discipline Catalogue工学::计算机科学与技术(可授工学、理学学位)
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/48734
Collection毕业生_博士学位论文
Recommended Citation
GB/T 7714
张朋举. 融合学习特征的室外大场景视觉定位研究[D]. 中国科学院自动化研究所. 中国科学院大学人工智能学院,2022.
Files in This Item:
File Name/Size DocType Version Access License
融合学习特征的室外大场景视觉定位研究-张(16113KB)学位论文 限制开放CC BY-NC-SA
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[张朋举]'s Articles
Baidu academic
Similar articles in Baidu academic
[张朋举]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[张朋举]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.