CASIA OpenIR  > 毕业生  > 博士学位论文
融合传统特征知识的深度学习图像特征点提取与匹配
傅禹杰
2024-05-18
Pages150
Subtype博士
Abstract

图像特征点提取与匹配包含关键点检测、局部描述子提取和特征点匹配这三个步骤,旨在估计两张含有共同可见区域的图像之间的点对应关系,是通过运动恢复结构、视觉定位和同时定位与建图等三维计算机视觉任务的重要基础,在服务机器人、自动驾驶、增强现实和智慧城市等新兴产业中具有广泛而重要的应用价值。近年来,图像特征点提取与匹配方法在深度学习的推动下取得了显著进展,但是在复杂困难的场景中仍然面临诸多问题。首先,当前最佳的基于深度学习的关键点检测方法未充分考虑关键点的匹配可靠性,易于从独特性较弱的区域中提取出难以被正确匹配的关键点。其次,当前最佳的基于深度学习的特征点提取方法未充分合理约束特征点的位置,也未充分考虑局部描述子与关键点的适配性,从而降低了特征点的匹配精度。而且,现有的特征点提取网络倾向于模糊特征点,难以精准地提取可靠的特征点。此外,现有的特征点提取与匹配方法对剧烈图像尺度变化不鲁棒。针对这些问题,本文从传统特征点提取方法中剖析出针对性的先验知识,并利用这些宝贵的先验知识引导和约束特征点的学习、提取与匹配过程,提出了系统性的更加准确鲁棒的特征点提取与匹配方法。具体地,本文的主要创新工作如下:

1. 提出了一种标注局部显著性知识的图像关键点检测学习方法。当前最佳的关键点检测学习方法主要利用基于可重复性的协变约束训练神经网络,经常检测出许多不能被可靠匹配的光滑边缘点。而传统关键点检测方法则主要检测在局部范围内显著独特的具有较高匹配可靠性的角点和斑点,且能有效抑制光滑边缘点。因此,为了研究出具有高匹配可靠性的基于深度学习的关键点检测方法,首先提出了一种通用局部显著性测量方法,挖掘出传统关键点检测方法中蕴含的局部显著性知识。其次,提出了局部显著结构保持损失函数,合理地融合局部显著性知识和协变约束,从而训练得到了一个可重复性高且匹配可靠性高的关键点检测网络。在含有剧烈光照和视角变化的多种不同场景中的实验结果表明,所提方法的可重复性和匹配准确率高于已有的关键点检测方法。

2. 提出了一种协变尖峰约束引导的图像特征点提取学习方法。当前最佳的基于协变约束的关键点检测学习方法仍然易于检测出不准确不可靠的关键点。其原因是,当前最佳的协变约束损失函数未合理约束局部关键点概率图中的概率分布形状。针对该问题,提出了协变尖峰约束损失函数,以一种可微的采样方式同时约束局部关键点概率图中的概率分布期望和形状,不仅提升了关键点的位置精度,还遵循了传统关键点检测方法中抑制不可靠边缘点的思想。此外,充分考虑了局部描述子与关键点的适配性,提出了条件神经重投影误差损失函数,在训练过程中模拟特征点匹配过程,高效地约束网络针对自身检测出的关键点优化局部描述子。在多个具有挑战性的数据集上的实验结果表明,与已有方法相比,由所提方法训练得到的特征点提取网络取得了更高的匹配精度。

3. 提出了一种局部显著性扩散引导的图像特征点提取网络。现有的基于深度学习的特征点提取方法主要采用普通卷积层确定特征点的位置。然而,普通卷积层倾向于平滑和模糊图像中的特征点,从而降低了特征点的位置精度和可重复性。针对该问题,提出了基于局部显著性的扩散模块。该模块模拟了传统图像处理方法中的非线性扩散过程,并利用传统特征点提取方法蕴含的局部显著性知识控制该扩散过程,从而增强了特征点在网络提取出的特征图中的显著性,进而提升了特征点的位置精度和可重复性。此外,还提出了一种基于局部描述子可靠性的特征点筛选方法,依据训练过程中的特征点匹配结果识别可靠特征点,并训练网络自主滤除具有不可靠局部描述子的特征点。本文利用所提模块设计了一种特征点提取网络,并用所提方法训练该网络。实验结果表明,所提网络在多个困难的数据集上取得了当前最佳的匹配精度。

4. 提出了一种感知共视区域的对尺度变化鲁棒的图像特征点匹配方法。当两张图像之间存在剧烈尺度变化时,现有的图像特征点提取与匹配方法难以为视觉任务估计出足够数量的正确点对应。本文依据传统特征点提取方法的数学原理分析该问题的成因,从而提出了一种感知尺度差异的图像特征点匹配方法。该方法以尺度比值量化两张图像之间的尺度差异。在提取特征点之前,该方法先估计两张图像之间的尺度比值,再依据尺度比值对图像作尺度变换,从而显著减小了两张图像之间的尺度差异,进而大幅提升了正确点对应数量。为了准确估计尺度比值,本文提出了一个基于共视注意力机制的匹配增强模块,并利用该模块设计了一种尺度比值估计网络。该模块使网络更加关注两张图像中共同可见区域的信息,并抑制其他区域的干扰,从而提升了尺度比值估计精度。在多个场景中的实验结果表明,在剧烈尺度变化情况下,所提方法显著提升了现有特征点提取与匹配方法的匹配精度。

Other Abstract

The image feature point extraction and matching task consists of three steps: keypoint detection, local descriptor extraction and feature point matching. It aims to estimate point correspondences between two images with covisible areas. It is an essential basis of 3D computer vision tasks such as structure from motion, visual localization and simultaneous localization and mapping. It has wide and significant applications in emerging industries such as service robots, autonomous driving, augmented reality and smart cities. Although image feature point extraction and matching methods have made significant progress with the help of deep learning, they still face many problems in complex and difficult scenes. Firstly, the existing best deep learning-based keypoint detection methods do not fully consider the matching reliability of keypoints. And they are prone to detect keypoints that are difficult to match correctly from image areas with low distinctiveness. Secondly, the existing best deep learning-based feature point extraction methods do not fully and properly constrain the positions of feature points. They do not fully consider the adaptability between local descriptors and keypoints, either, which reduces the matching accuracy of feature points. Besides, existing feature point extraction networks tend to smooth and blur feature points. Thus, it is difficult for them to extract reliable feature points precisely. Moreover, existing feature point extraction and matching methods are not robust to large scale changes in images. To address these problems, this dissertation analyzes pertinent prior knowledge from handcrafted feature point extraction methods, utilizes valuable prior knowledge to guide and constrain the learning, extraction and matching processes of feature points, and proposes a series of more accurate and robust feature point extraction and matching methods. The main contributions and innovations of this dissertation are as follows:

1. This dissertation proposes a learning-based image keypoint detection method with annotating local saliency knowledge. The existing best learning-based keypoint detection methods mainly use the repeatability-based covariance constraint to train neural networks, which often detect many smooth edge points that cannot be matched reliably. By contrast, handcrafted keypoint detection methods mainly detect local salient and distinctive corners and blobs with higher matching reliability and do well in suppressing edge points. Therefore, to present a learning-based keypoint detection method with high matching reliability, firstly, this dissertation proposes a general local saliency measure method that digs out the local saliency knowledge from handcrafted keypoint detection methods. Secondly, this dissertation proposes a local salient structure maintaining loss function that properly leverages the local saliency knowledge and the covariance constraint to obtain a keypoint detection network with high repeatability and high matching reliability. Experimental results on diverse scenes with significant illumination and viewpoint changes show that the proposed method achieves higher repeatability and matching accuracy than existing keypoint detection methods.

2. This dissertation proposes an image feature point extraction learning method guided by a proposed covariant peak constraint. The existing best keypoint detection methods based on the covariance constraint are still prone to detect inaccurate and unreliable keypoints. The reason is that the existing best covariance constraint loss functions do not properly constrain the probability distribution shapes in local keypoint probability maps. To address this problem, this dissertation proposes a covariant peak constraint loss function that simultaneously constrains the expectations and shapes of the probability distributions in local keypoint probability maps by a differentiable sampling method. The proposed loss function not only raises the position accuracy of keypoints, but also follows the idea of suppressing unreliable edge points in handcrafted keypoint detection methods. Besides, this dissertation fully considers the adaptability between local descriptors and keypoints, and proposes a conditional neural reprojection error loss function. The proposed loss function simulates the feature point matching process and forces a network to optimize the local descriptors for the keypoints detected by itself. Experimental results on several challenging datasets show that the feature point extraction network trained with the proposed methods achieves higher matching accuracy than existing methods.

3. This dissertation proposes an image feature point extraction network guided by a local saliency-based diffusion process. Existing deep learning-based feature point extraction methods mainly adopt vanilla convolutional layers to determine the positions of feature points. However, vanilla convolutional layers tend to smooth and blur image feature points, which reduces the position accuracy and repeatability of feature points. To address this problem, this dissertation proposes a local saliency-based diffusion module. This module simulates the nonlinear diffusion processes in traditional image processing methods and uses the local saliency knowledge of handcrafted feature point extraction methods to control diffusion processes. It strengthens the saliency of feature points in feature maps extracted by networks and improves the position accuracy and repeatability of feature points. Besides, this dissertation proposes a local descriptor reliability-based feature point selection method. This proposed method identifies reliable feature points based on the matching results of feature points during the training process. It teaches a network to filter out feature points with unreliable local descriptors automatically. This dissertation makes use of the proposed module to construct a feature point extraction network and utilizes the proposed training method to train the network. Experimental results show that the proposed network achieves state-of-the-art matching accuracy on several difficult datasets.

4. This dissertation proposes a covisible-area-aware image feature point matching method that is robust to large scale changes in images. When there exists a large scale change between two images, it is difficult for existing image feature point extraction and matching methods to estimate sufficient correct point correspondences for visual tasks. This dissertation analyzes the cause of the problem based on the mathematical principles of handcrafted feature point extraction methods and proposes a scale-difference-aware image feature point matching method. In the proposed method, a scale ratio is used to quantify the scale difference between two images. Before feature point extraction, the proposed method estimates the scale ratio between two images firstly. Secondly, it applies scale transformations to images according to the scale ratio. In that way, it significantly reduces the scale difference between two images and greatly raises the number of correct point correspondences. To estimate the scale ratio precisely, the dissertation proposes a covisibility-attention-reinforced matching module and uses this module to design a scale ratio estimation network. The proposed module guides the network to lay more stress on covisible areas between two images and suppresses the distraction from other areas. Thus, the proposed module improves the scale ratio estimation accuracy. Experimental results on multiple scenes show that the proposed method greatly boosts the matching accuracy of existing feature point extraction and matching methods under the circumstances of large scale changes in images.

Keyword图像匹配 关键点检测 局部描述子提取 局部显著性知识 深度学习
MOST Discipline Catalogue工学::控制科学与工程
Language中文
IS Representative Paper
Sub direction classification三维视觉
planning direction of the national heavy laboratory视觉信息处理
Paper associated data
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/57190
Collection毕业生_博士学位论文
Corresponding Author傅禹杰
Recommended Citation
GB/T 7714
傅禹杰. 融合传统特征知识的深度学习图像特征点提取与匹配[D],2024.
Files in This Item:
File Name/Size DocType Version Access License
融合传统特征知识的深度学习图像特征点提取(41077KB)学位论文 限制开放CC BY-NC-SA
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[傅禹杰]'s Articles
Baidu academic
Similar articles in Baidu academic
[傅禹杰]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[傅禹杰]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.