CASIA OpenIR  > 毕业生  > 硕士学位论文
基于特征匹配的大视角场景视觉定位方法研究
邓加昕
2024-05
Pages74
Subtype硕士
Abstract

视觉定位是计算机视觉领域十分重要的研究任务,主要是以相机为主要传感器,通过相机拍摄的图像来研究定位问题,在移动机器人,自动驾驶汽车,增强现实等领域有十分深入和广泛的应用。视觉定位问题不仅有着丰富的研究价值,还有着广阔的应用前景。

 

目前的视觉定位方案通常依赖于已构建的三维地图,通过查询图像与参考图像之间进行局部特征匹配,进而实现精确定位。近年来,随着深度学习的兴起,由学习的特征逐渐代替了手工的特征。基于深度学习的特征对于光照,季节变化的鲁棒性更好,同时,在一些挑战性场景,例如重复模式和弱纹理区域等都有更加良好的性能。然而,对于实际应用中经常出现的相机视角变化较大的场景,因为重叠区域较少而无法提取有效可靠的关键点,同时无关特征太多而造成无法获得重复性较强的特征,目前的方法往往表现不佳,一般无法满足精度和鲁棒性的要求。

 

因此,本文针对大视角场景下的视觉定位问题进行了研究。首先对当前基于 局部特征匹配的视觉定位方法进行了改进,当前的方法大都没有考虑特征的结 构信息,过多的依赖于相似的局部外观特征,而对于全局语义把握不足。因此, 在特征更新过程中引入了几何信息过滤模块,对关键点特征池化方法进行了改 进;另一方面,针对视角跨度极大的场景,局部特征匹配的方法往往因为难以检 测到相似的特征而无法定位。因此,提出了基于地空多视角特征融合的视觉定位方法,在俯视视角下进行特征的融合与匹配,实现了在极大视角变化场景下的精确定位。研究工作和成果如下:

 

1. 基于几何信息过滤的局部特征匹配算法:作为视觉定位的一环,往往要 求特征匹配能够既要在昼夜、天气季节和视角的变化下仍可以准确的估计相机 姿态。对于时间和季节变化, 随着深度学习的兴起,深度特征往往能够学习到这种不变性。然而,对于大视角变化的场景,关键的有效区域少,可重复性特征不足,无效信息和噪声多,当前的深度匹配器在视角发生大变化时性能会下降。针

对这一问题,本文提出了引入几何信息过滤的局部特征匹配算法。核心在于引入了几何约束模块,对于每一个注意力层,通过匹配得到的对应关系在几何上约束特征的更新,实现自适应的池化操作,滤除大量无关的特征,经过迭代式的特征更新操作,得到最终特征对应结果。同时本文还对提出的局部特征匹配算法进行了大量实验分析,结果表明在引入几何约束后,消除了大量的匹配无关的特征,减少了噪声的影响,提高了在大视角场景下的匹配精度与计算效率。

 

2. 基于地空多视角特征融合的视觉定位方法:传统的视觉定位方法的核心是局部特征匹配。然而,局部特征匹配依赖于局部内容细和节上的相似。在实际应用中,经常出现由于拍摄相机位姿差距过大,使得图像间的重复区域几乎消失,无法获得足够多局部相似特征的场景,例如地空视角的匹配。此时,基于局部特征匹配往往无法精确定位。同时,不同视角的图像具有不同的特点,地面视角往往分辨率更大更清晰, 但视野较小,更多的呈现具体的内容信息;空中视角则分辨率较小,但包含了丰富的语义和场景结构信息,不同的视角良好的互补特性。因此,对于此类视角差异极大的场景,基于已有的场景回归的方法,本文提出了地空多视角融合的方法,利用已知的拍摄相机位姿将多视角图像融合成包含了丰富场景信息的 BEVBird’s Eye View)鸟瞰图特征,提供了一个全局的俯

视视角,以帮助理解和分析环境中的物体和结构,在 BEV 特征空间下进行匹配, 直接回归得到查询图像的位姿,实现精确定位。本文对该方法进行了大量实验, 并与当前局部特征匹配方法进行了对比,结果表明,在视角差异极大的场景下,该方法能够超越传统的定位方法,完成精确而鲁棒的定位。

Other Abstract

Visual positioning is a very important research task in the field of computer vision. It mainly uses the camera as the main sensor to study positioning problems through the images taken by the camera. It has a very in-depth and extensive application in mobile robots, autonomous vehicle, augmented reality and other fields. Visual positioning problems not only have rich research value, but also have broad application prospects.

 

The current visual localization solutions typically rely on constructed 3D maps and achieve precise localization through local feature matching methods. In recent years, learning feature descriptors have gradually replaced manual features, and the robustness of features to lighting and seasonal changes has also been gradually enhanced, reducing the dependence on feature detection accuracy. This has greatly improved the performance of repetitive patterns and weak texture conditions. However, for scenes with significant camera angle changes that often occur in practical applications, effective and reliable key points cannot be extracted due to the limited overlapping areas. Current methods, due to the lack of consideration of geometric structure, rely too much on local content features, and lack grasp of global semantics, often perform poorly and generally cannot meet the requirements of accuracy and robustness.

 

Therefore, this article focuses on the visual localization problem in large angle scenes, combines existing research foundations, introduces geometric constraints, and improves the keypoint feature pooling method; At the same time, a feature fusion method for ground and air multi view scenarios with a wide range of perspectives is proposed, which improves the shortcomings of current work in scenarios with a wide range of perspectives. The research work and achievements are as follows:

 

1. Local Feature Matching Algorithm Based on Geometric Information Filtering:As a part of visual positioning, feature matching is often required to accurately estimate camera pose even under changes in day and night, weather seasons, and viewing angles. For temporal and seasonal variations, with the rise of deep learning, deep features are often able to learn this invariance. However, for scenes with large angle

of view changes, there are few key effective regions, insufficient repeatable features, and a lot of invalid information and noise. The current depth matcher’s performance will deteriorate when the angle of view changes significantly. This article proposes a local feature matching algorithm that introduces geometric constraints to address this issue. The core lies in the introduction of a geometric constraint module, which constrains the update of features geometrically for each attention layer through the matching correspondence, achieving adaptive pooling operation, filtering out a large number of irrelevant features, and obtaining the final feature corresponding result through iterative feature update operation. At the same time, this article also conducted extensive experimental analysis on the proposed local feature matching algorithm, and the results showed that after introducing geometric constraints, a large number of matching irrelevant features were eliminated, the influence of noise was reduced, and the matching accuracy and computational efficiency in large angle scenes were improved.

 

2. Multi perspective feature fusion method for ground and air:The core of traditional visual localization methods is local feature matching. However, local feature matching depends on the similarity between local content details and sections. In practical applications, it is common to encounter scenes where there is a significant difference in camera pose, resulting in the almost disappearance of duplicate areas between images and the inability to obtain sufficient local similar features, such as matching of open ground perspectives. At this point, local feature matching often cannot accurately locate. At the same time, images from different perspectives have different characteristics. The ground perspective often has a larger and clearer resolution, but a smaller field of view, presenting more specific content information; The aerial perspective has a smaller resolution, but it contains rich semantic and scene structure information, and different perspectives have good complementary characteristics. Therefore, for scenes with significant differences in perspective, based on existing direct scene regression methods, this paper proposes a ground air multi view fusion method. By using known camera poses to fuse multi view images into BEV features containing rich scene information, matching is performed in the BEV feature space, and the pose of the query image is directly regressed to achieve accurate positioning. This article conducted extensive experiments on this method and compared it with current local feature matching methods. The results showed that in scenes with significant differences in perspectives, this method can surpass traditional positioning methods and achieve accurate and robust positioning.

Keyword视觉定位 图像匹配 特征过滤 特征融合
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/57206
Collection毕业生_硕士学位论文
Recommended Citation
GB/T 7714
邓加昕. 基于特征匹配的大视角场景视觉定位方法研究[D],2024.
Files in This Item:
File Name/Size DocType Version Access License
论文网上提交版本.pdf(2657KB)学位论文 限制开放CC BY-NC-SA
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[邓加昕]'s Articles
Baidu academic
Similar articles in Baidu academic
[邓加昕]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[邓加昕]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.