CASIA OpenIR  > 毕业生  > 博士学位论文
物体分类与检测中的小样本学习
李岩
Subtype博士
Thesis Advisor黄凯奇
2019-05-21
Degree Grantor中国科学院自动化研究所
Place of Conferral中国科学院自动化研究所
Degree Discipline模式识别与智能系统
Keyword物体分类 物体检测 小样本学习 零样本学习 混合监督检测
Abstract

长期以来,物体分类和物体检测一直是计算机视觉领域中两个最基础的问题,是语义分割、图像解析、行为识别、目标跟踪以及图像/视频理解等其他高层次复杂视觉任务的基础。近年来,随着卷积神经网络等深度学习技术的发展,以及大规模算法数据集的提出,物体分类和检测技术在方法性能和应用场景规模上都取得了极大的进步。现有物体分类和物体检测算法往往依赖于监督训练,需要大规模的标注数据。对于物体分类任务,算法的训练需要图片的类别标注;对于物体检测任务,则需要对图片中的物体进行细致的位置标注。然而,针对大量物体进行准确的标注往往是困难的,需要巨大的花费,尤其考虑到在现实应用场景下,算法需要处理成千上万种新的物体概念,同时这些新概念往往会随着时间不断变化,采用监督学习的方法对这些新概念采集海量的标注样本费时费力,甚至是不可行的。为了解决这些问题,研究与探讨如何采用尽可能少的标注数据来得到可靠的物体识别模型就显得很有必要,我们把这样的使用少量标注数据进行模型训练的算法学习方式称之为小样本学习。根据使用训练样本的数量以及训练样本标注的质量不同,小样本学习有三种不同的设定:1)测试类别的训练数据全部缺失;2)部分或全部测试类别只有少量的训练样本,同时这些训练样本具有完整的标注;3)部分或全部测试类别的训练样本标注粗糙、不完整。小样本学习技术可以在不同的计算机视觉和模式识别领域中发挥作用,在本文中我们主要探索其在物体分类和物体检测方面的应用,同时主要聚焦于第一种和第三种小样本学习设定,开展了如下研究工作:

1. 研究了基于语义结构约束的零样本学习模型。零样本学习的目标是使图片分类模型可以对从未在训练集中出现过的类别进行准确地分类。零样本学习属于物体分类问题的范畴,同时符合小样本学习第一种设定,即测试类别的训练样本完全缺失的情形。

经典的零样本学习算法一般采用分隔的两阶段流程,首先算法使用预训练好的深度学习模型抽取训练图片的特征,然后这种固定的图片特征被用来和辅助的语义信息一起学习一个视觉—语义嵌入式空间。对于零样本学习任务来说,这样的训练流程没有充分利用辅助语义信息包含的语义结构信息,从而导致用于算法训练的图片特征中语义结构信息的缺失。为了解决这一问题,本文提出了一种可端到端训练的深度语义结构约束模型。本文提出的模型包含有两个不同的约束,一个是结构化图片特征约束,其目的是使混杂的图片特征空间转化为符合语义结构的规整空间;另一个约束是语义空间结构保持约束。语义空间结构保持约束的目标是使在最终的嵌入式空间中,各个物体的特征表达不再局限于一个个孤立的点,而是一个个符合语义结构的流形,从而增强了模型的泛化能力。最终在语义结构信息的帮助下,针对零样本学习任务,训练得到的模型获得了更多的有益信息。本文提出的基于语义结构约束的零样本学习模型在多个数据集上取得了当时最佳的实验效果,验证了方法的有效性。

2. 研究了基于隐式区分性特征学习的零样本学习模型。在上一个工作的基础上,本文提出的这一工作从隐式区分性特征表达的角度进行探索,进一步提升零样本学习的算法性能。长期以来,已有的零样本学习工作主要聚焦于在视觉空间和语义空间之间学习一个恰当合适的投影矩阵以联结两个空间中同类物体的特征表达,然而学习区分性特征对于零样本学习的重要性却被忽略了。在这一工作中,我们回顾了已有零样本学习的方法流程,同时证明了对于零样本学习,在视觉空间和语义空间中学习区分性特征的重要性。本文提出了一种端到端的深度网络框架在视觉空间和语义空间两个方面分别提出方法,学习区分性更强的特征表达。在视觉空间中,本文提出了自动聚焦网络,能够全自动地从整张图片中挖掘区分性图片区域;在语义空间中,提出的算法可以在增广属性空间中学习区分性更强的物体语义表达。增广属性空间包含用户定义的属性和隐式属性两种不同的属性表达。本文提出的隐式区分性特征学习模型在两个零样本学习数据集上进行了充分的实验,最终详尽的实验结果表明,算法性能极大地超过了当时的同领域最佳性能,证明了算法的有效性。

3. 研究了基于视觉中心适应的直推式零样本学习模型。由于零样本学习任务中训练类别和测试类别之间没有重合,在训练类别上得到的零样本学习模型往往无法在测试类别上取得同样良好的性能结果。这一问题一般被称为领域偏移(domain shift)现象。为了解决这一问题,本文提出了一种基于视觉中心适应的直推式零样本学习模型。对于训练类别,算法建立了一个从语义空间中物体表达到视觉空间中类别中心点的投影。在测试类别中建立类似的投影时,算法引入了基于对称倒角距离(symmetric Chamfer distance)的约束项。这一约束试图将经过投影映射之后生成的测试类别视觉中心点和经过在视觉空间聚类之后得到的模拟真实测试类别中心点这二者的数据分布拉近对齐。最终算法使用经过对齐后的测试类别中心点进行零样本学习测试,极大地提升了模型的泛化能力。在多个常用零样本数据集上进行测试表明,本文提出的基于视觉中心适应的零样本学习模型取得了同领域最佳的实验性能。

4. 研究了基于鲁棒的物体性知识迁移的混合监督检测模型。混合监督检测的任务目标是利用一些已有的全标注训练类别数据来辅助在新类别上进行的弱监督检测任务。在混合监督检测的实验设定中,算法假设可以利用一些已经存在的全标注类别。对于这些全标注类别,训练图片同时具有物体类别标注和用矩形框(bounding box)定义的物体位置标注;算法的目标是对一些全新的弱标注类别进行检测任务。对于这些弱标注类别,其训练图片只拥有简单的图片类别标注。因此,混合监督检测符合第三类小样本学习的定义,即测试类别使用的训练数据其标注是粗糙和弱化的。

之前的混合监督检测方法一般采用直接修改模型的思想,即通过一些启发式的人为设定的规则,将在已有类别上训练好的检测模型迁移到新类别上去。检测模型是多层卷积神经网络的情况下,这个迁移过程往往只是改变多层网络中最后一层的参数。这样的方法启发式强,同时通过改变最后一层网络参数进行模型迁移的思想比较简单,只在某些理想条件下才能起效。与这些工作不同,本文提出了一种更加合理同时也更加鲁棒的基于物体性知识迁移的混合监督检测模型。算法首先从已有的全标注类别中学习物体性知识,这种物体性知识是基于一种具有领域不变性的特征进行建模的,因此得到的物体性知识对于已有类别和新类别之间分布差异是鲁棒的,是具有领域不变性的知识。然后在物体性知识的指导下,算法使用多实例学习方法(Multiple Instance Learning,MIL)同时对图片中完整的物体和存在的干扰项(比如部分物体)进行建模,试图进一步提升模型甄别新类别弱标注图片中干扰项的能力。最终,本文提出的基于鲁棒的物体性知识迁移的混合监督检测模型在ILSVRC2013 detection数据集和Pascal VOC数据集上进行了测试,实验表明算法超过了当时已有的混合监督检测方法,取得了良好的实验性能。

Other Abstract
In the field of computer vision, object classification and detection are two of the most fundamental problems for a long time. They are the basis of many other complex vision tasks, such as semantic segmentation, image parsing, action recognition, tracking and image/video understanding. Recently, with the rapid development of convolution neural networks (CNNs) and the introduction of challenge datasets, object classification and detection have been improved drastically in both performance and scale. The supervised training process of the existing object classification and detection methods typically requires a large amount of labeled training samples with object class and localization annotations. However, the exact annotations for a great deal of objects are difficult and expensive to be acquired. Moreover, in real-world applications, when we need to deal with hundreds of thousands of new concepts emerging and changing over time, it is laborious and even infeasible to collect millions of annotated samples. To address this issue, it is necessary to explore how to obtain reliable recognition models with as few supervision as possible to remove the restrictions caused by annotation costs, which we refer to as the small sample learning (SSL). Based on the quantity of training samples and the quality of the annotations used during training process, SSL has three variant learning paradigms: 1) the training samples for some/all categories are totally missed; 2) only a few training samples with complete annotations are accessible for some/all categories; 3) the annotations of training samples for some/all categories are coarse and insufficient. Small sample learning can be used in many computer vision and pattern recognition fields, and in this paper we focus on its application in object classification and detection. Among the three learning paradigms of SSL, our research concentrates on the first and the third paradigms. The contributions are as follows:


1. Deep semantic structural constraints for zero-shot learning. Zero-shot learning (ZSL) seeks to make image classification models able to classify image categories which never appear in the training set. Zero-shot learning belongs to the object classification problem and conforms to the first SSL paradigms that the training samples are totally missed.


Typical ZSL methods adopt a separated two-step pipeline that extracts image features from pre-trained CNN models. Then the fixed image features are utilized to learn a visual-semantic embedding space with some auxiliary semantic information. It leads to the lack of specific structural semantic information of image features for ZSL task. We propose an end-to-end trainable deep semantic structural constraints (DSSC) model to address this issue. The proposed model contains the image feature structure constraint and the semantic embedding structure constraint, which aim to learn structure-preserving image features and endue the learned embedding space with stronger generalization ability respectively. With the assistance of semantic structural information, the model gains more auxiliary clues for ZSL. The state-of-the-art performance certifies the effectiveness of our proposed method.


2. Discriminative learning of latent features for zero-shot recognition. This work continues to improve the ZSL with latent discriminative features (LDF). For years, among existing ZSL methods, it has been the center task to learn the proper mapping matrices aligning the visual and semantic space, whilst the importance to learn discriminative representations for ZSL is ignored. In this work, we retrospect existing methods and demonstrate the necessity to learn discriminative representations for both visual and semantic instances of ZSL. We propose an end-to-end network that is capable of 1) automatically discovering discriminative regions by a zoom network; and 2) learning discriminative semantic representations in an augmented space introduced for both user-defined and latent attributes. Our proposed method is tested extensively on two challenging ZSL datasets, and the experiments results show that the proposed method significantly outperforms state-of-the-art methods.


3. Transductive zero-shot learning via visual center adaptation. This work proposes a visual center adaptation method (VCAM) to address the domain shift problem in ZSL. For the seen classes in training data, our method builds an embedding space by learning the mapping from semantic space to some visual centers. While for unseen classes in test data, the construction of embedding space is constrained by a symmetric Chamfer-distance term, aiming to adapt the distribution of the synthetic visual centers to that of the real cluster centers. Therefore the learned embedding space can generalize the unseen classes well. Experiments on two widely used datasets demonstrate that our model significantly outperforms state-of-the-art methods.


4. Mixed supervised object detection with robust objectness transfer. Mixed supervised detection (MSD) considers the problem of leveraging existing fully labeled categories to improve the weakly supervised detection (WSD) of new object categories. In MSD, we assume that we already have a set of fully labeled categories, and both bounding box annotations and image-level labels are available for them. Meanwhile we have some weakly labeled categories, for which we only have access to their image category labels. Thus MSD conforms to the third SSL paradigm that the used annotations are coarse and weak. 


Different from previous MSD methods that directly transfer the pre-trained object detectors from existing categories to new categories, we propose a more reasonable and robust objectness transfer approach for MSD. In our framework, we first learn domain-invariant objectness knowledge from the existing fully labeled categories. The knowledge is modeled based on invariant features that are robust to the distribution discrepancy between the existing categories and new categories. Under the guidance of learned objectness knowledge, we utilize multiple instance learning (MIL) to model the concepts of both objects and distractors (e.g., object parts) and to further improve the ability of rejecting distractors in weakly labeled images. Our robust objectness transfer approach outperforms the existing MSD methods, and achieves state-of-the-art results on the challenging ILSVRC2013 detection dataset and the Pascal VOC datasets.
Pages166
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/23792
Collection毕业生_博士学位论文
Recommended Citation
GB/T 7714
李岩. 物体分类与检测中的小样本学习[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2019.
Files in This Item:
File Name/Size DocType Version Access License
Thesis-20190527-liya(78887KB)学位论文 暂不开放CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[李岩]'s Articles
Baidu academic
Similar articles in Baidu academic
[李岩]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[李岩]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.