CASIA OpenIR  > 模式识别国家重点实验室  > 机器人视觉
基于深度学习的局部图像特征描述方法研究
田雨润
Subtype博士
Thesis Advisor吴福朝 ; 樊彬
2019-05-22
Degree Grantor中国科学院自动化研究所
Place of Conferral中国科学院自动化研究所
Degree Name工学博士
Degree Discipline模式识别与智能系统
Keyword局部特征描述子 深度学习
Abstract

局部图像特征匹配是计算式视觉领域的一个关键问题,其目的在于准确地建立两幅或者多幅图像之间像素位置的对应关系,

这也是完成诸如三维重建、 物体识别、图像配准和图像拼接等诸多计算机视觉任务的先决条件。

特征描述是特征匹配的主要手段,近年来对该问题的研究取得了长足进步,尤其是随着深度学习的发展以及 一系列大规模标注数据集的出现,学习型图像特征描述方法在匹配精度上相对传统手工方法获得了很大提升。

然而,由于成像场景的多样性和成像条件的复杂性,局部图像特征描述仍然是一个极具挑战性的研究课题。

 

本文针对局部图像特征匹配描述问题进行了深入研究,主要涉及如何借助深度卷积神经网络CNN(Convolational Neural Network)这一强大工具来学习鲁棒的局部图像特征表达,论文的主要创新点如下:

提出了一种欧氏距离下的高性能描述子L2Net。

从CNN网络结构、训练样本采样和误差函数三个方面对描述子的学习过程进行了重新设计。

(1)提出了一种适合描述子学习的CNN网络结构。

目前主流的CNN结构多是面向图像的全局特征提取的,而局部图像特征描述处理的对象是局部图像面片,相对全局图像具有较小的尺寸和较少的信息冗余。

相对于全局图像,局部面片具有较小的尺寸和较少的信息冗余。

针对这一问题,提出了一种适合提取局部面片特征的网络结构。

(2)提出了一种高效的训练集采样方法。

正负样本不均衡是描述子学习中需要面对的一个重要难题,其表现为负样本的数量远高于正样本数量,因此不可能采样大量正样本的情况下遍历所有负样本。

针对这一问题提出了一种高效的采样策略Progressive Sampling,该方法能快速采样数以亿计的负样本使训练效率显著提高。

(3)提出了一种新型损失误差函数。

该误差函数中包含三个误差项,分别优化描述子的匹配能力、网络中间特征图的区分能力和描述子各个维度的相关性。

实验结果表明,使用该误差函数进行训练能带来描述子匹配性能的显著性能提升。

提出了一种基于二阶相似性(Second Order Similarity)的高性能描述子SOSNet。

现有的描述子学习方法都是在优化描述子的一阶相似性(First Order Similarity)。

二阶相似性刻画了特征空间中两点之间的相互关系,具有更强的鲁棒性。

因此,在描述子训练过程引入了一种基于二阶相似性的正则误差项。

实验证明,在训练过程中联合一阶与二阶相似性能使描述子匹配性能大幅提升。

提出了一种基于描述子空间分布状况的匹配性能评价方法。

现有的FPR(False Positve Rate)和mAP(mean Average Precision)等性能指标只能直接衡量描述子的匹配能力,对描述子的设计缺乏指导意义。

针对这一问题,提出了一种能够分析描述子在其空间分布状况的评价指标,用于解释描述子分布与其性能之间的关系,并对已有的典型特征描述子的性能进行了分析和对比。

Other Abstract

The process of describing local patches is a fundamental component in many computer vision tasks such as 3D reconstruction, large scale image localization, and image retrieval.

In recent years, large datasets with corresponding ground truths have led to the development of large scale learning methods, which stimulated a wave of works on descriptor learning.

Recent work has shown that these learning based methods are able to significantly outperform their hand-crafted counterparts.

However, due to the fact that existing deep learning based descriptors is not competent enough for replacing their hand-crafted counterparts, there still needs a lot of efforts in this area.

 

In this dissertation, we tackle the problem of local feature descriptor learning via deep learning technique.

Specifically, we focus on utilizing convolutional neural networks(CNNs) to learning robust descriptor for feature matching.

The main contributions of this thesis are:

We propose a descriptor named L2Net that can be matched under Euclidean distance.

We design the descriptor learning process from three aspects:

(1) A new CNN architecture for local feature descriptor learning is proposed.

Currently the CNN architectures are mostly designed for global feature extraction, however, different from global images, local patches have lower resolution and less redundant information, thus it would be inappropriate to modify existing CNN models that are designed for global images.

To solve this problem, a new CNN architecture is proposed.

(2) A progressive sampling strategy is proposed.

In the descriptor learning problem, the positive and negative samples are highly unbalanced, i.e., the amount of negative samples is far more than that of the positive samples.

The proposed progressive sampling strategy can visit billions of negative samples in just tens of training epochs, thus it can significantly accelerate the training speed.

(3) A new loss function is proposed.

This loss function consists of three loss terms that optimize the mathcing performance of the descriptors, the similarity of the intermediate feature maps and the compactness of tthe descriptors, respectively.

Experiments show that this loss function can bring consistent performance increase.

We propose a descriptor named SOSNet by utilizing second order similarity in the training procedure.

Existing descriptor learning methods all optimized the first order similarity, i.e., the Euclidean distance.

Compared with first order similarity, second order similarity is more capable of capturing the geometry information of the data points and thus more robust to geometry distortion.

In order to take advantage of this powerful metric, we propose a regularization term based on the second order similarity which is only used for the training stage.

Experiments show that when combining this regularization term with another loss optimizing first order similarity, more robust descriptor can be learned.

A new evaluation protocol is proposed to analyze the distribution of the learned descriptor space:

FPR(False Positive Rate) and mAP (mean Average Precision) are the most commonly seen performance indicators for evaluating local feature descriptors.

However, these indicators fail to indicate the links between the matching performance and the distribution of descriptors.

Therefore a new performance indicator based on the von Mises-Fisher distribution is proposed to shed light on what makes certain descriptors good in matching.

Pages99
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/23808
Collection模式识别国家重点实验室_机器人视觉
Recommended Citation
GB/T 7714
田雨润. 基于深度学习的局部图像特征描述方法研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2019.
Files in This Item:
File Name/Size DocType Version Access License
Thesis.pdf(10072KB)学位论文 开放获取CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[田雨润]'s Articles
Baidu academic
Similar articles in Baidu academic
[田雨润]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[田雨润]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.