CASIA OpenIR  > 毕业生  > 博士学位论文
零样本学习中的域漂移问题研究
刘博
2022-05
页数162
学位类型博士
中文摘要

零样本学习是指利用已见类别的数据学习一个模型用于识别未见类别的数据。由于它可以缓解深度学习模型对大量标注数据的依赖,同时有助于深度学习应用系统在现实的开放环境中落地,因此近年来引起了机器学习及其应用领域的广泛关注。零样本学习的基本思想是在已见类别域上学习一个视觉特征与语义特征之间的映射模型,然后将该模型迁移到未见类别域。但由于已见类别域上的视觉-语义映射关系和未见类别域上的视觉-语义映射关系天然地具有一定的偏移,即所谓的域漂移,因此使得通过已见类别数据学习的模型在未见类别域上的泛化性受到限制。针对域漂移问题,虽然研究人员从不同的角度提出了大量的方法,但目前该问题在很大程度上依然没有被解决。本文主要围绕零样本学习中的域漂移问题,从困难类别现象、特征生成、无标签数据利用以及新的任务场景四个不同的角度展开了研究。具体地,本文的主要工作包括以下几方面:

1、目前针对域漂移问题的研究都是将未见类别视为一个整体,缺少对不同未见类别域漂移程度的深入观察和分析。本文发现了零样本学习中一个普遍存在却被忽视的现象,即困难类别现象。该现象表明不同未见类别具有不同程度的域漂移。基于该现象,本文首先经验性的证明了困难类别对于缓解域漂移问题具有特殊的重要性。接着,本文深入分析了困难类别的成因,并揭示了语义近邻性是导致困难类别的一个重要原因。基于以上分析,本文在两种不同的零样本学习方式下分别提出了相应的困难类别的判定准则以及基于困难类别的学习框架。本文提出的学习框架可以适用于目前大多数的零样本学习方法,并能显著地改进其性能。

2、由于域漂移的存在,现有的生成式零样本学习方法生成的视觉特征分布与真实的分布之间存在较大的偏移。针对该问题,本文提出了一种基于视觉原型和视觉残差的特征生成方法。在该方法中,视觉特征被分解为视觉原型与视觉残差,并通过两个单独的模型分别学习。本文利用了一个视觉原型预测器对每个类别的视觉原型进行预测,并用于表示每类视觉特征分布的中心,从而显式地对视觉特征分布偏移进行了约束。在视觉原型预测器中,本文设计了一种基于语义误差的视觉特征选择方法,用于选取具有语义一致性的视觉特征维度。同时,与现有建模视觉特征分布的方法不同,本文利用了一个生成对抗网络建模视觉残差分布。最后,本文结合视觉原型和视觉残差分布,合成了大量的视觉特征。本文的方法可以生成偏移更小的视觉特征分布,从而有效地缓解了域漂移问题。

3、无标签未见类别数据的利用一般可以有效缓解零样本学习中的域漂移问题。现有的方法主要通过单模型自训练来利用无标签数据,容易陷入局部极小值。本文提出了一种基于多模型协同训练的直推式学习框架,用于无标签数据的挖掘。在该框架中,两个(或多个)基本模型(可以是任意的归纳式零样本学习方法)用于生成未见类别数据的伪标签,伪标签交换模块用于指导基本模型之间的协同训练。为了提高协同训练的效率,本文进一步提出了两条基本模型选择准则以及三种伪标签交换机制。此外,为了将该方法推广到广义零样本学习,本文进一步提出了一种基于语义知识的域检测模型用于缓解广义零样本学习中的偏置问题。

4、目前的零样本学习研究主要集中于图像分类任务。3D场景语义分割任务中的零样本学习是一个具有重大应用价值的问题,但目前的文献中还没有相关的研究报道。本文设计了一个新的零样本3D场景语义分割任务,并构建了相应的数据集、评价指标以及基线方法。针对零样本3D场景语义分割中的域漂移问题,本文提出了一种基于语义知识的3D点特征生成模型,同时设计了一种基于几何关系的特征增强方法用于模型的训练。

英文摘要

Zero-shot learning (ZSL) is to learn a model with only seen-class data for recognizing unseen-class data. Since it has the advantages of reducing the heavy dependence of deep learning models on large-scale labeled datasets and accelerating the process to deploy deep learning systems in a realistic open-set environment, it has gained broad attention from the machine learning field recently. The basic idea of ZSL is to learn a mapping model between visual features and semantic features with the data from the seen-class domain, and then transfer the learned model to the unseen-class domain. However, since the visual-semantic mapping in the seen-class domain is naturally shifted from that in the unseen-class domain, i.e. the domain shift, the learned mapping in the seen-class domain has limited generalization ability in the unseen-class domain. Addressing the domain shift problem, despite many methods have been proposed from different perspectives, this problem remains largely unsolved. In this paper, we have concentrated our study on the domain shift problem from four different perspectives: hard-class phenomenon, feature generation, exploitation of unlabeled data, and new task scenario. Specifically, our main contributions include:

1. The existing works on the domain drift problem regarded the unseen classes as a whole, and lacked in-depth observation and analysis on the domain shift degree of different unseen classes. We discovered a ubiquitous but ignored phenomenon in ZSL, called hard-class phenomenon, which refers to that different unseen classes face different degrees of domain shift. Based on this discovery, we firstly demonstrated the peculiar importance of hard classes for alleviating the domain shift problem empirically. Then, the possible causes of hard-class phenomenon were analyzed in depth, indicating that semantic adjacency is an essential factor for the emergence of hard classes. Based on these analyses, we further proposed two metrics for identifying hard classes in two different ZSL settings respectively, as well as two hard-class based learning frameworks, which can be applied to most of the existing ZSL methods to further substantially boost their performances.

2. Due to the domain shift, the distribution of visual features generated by existing generative ZSL methods is usually shifted from the true distribution. To alleviate this problem, a novel method was proposed to generate visual features by combining visual prototypes and adversarial visual residuals. In the proposed method, visual features are decomposed into visual prototypes and visual residuals, which are learned by two different models respectively. A visual prototype predictor is used to predict the visual prototype for each class, which represents the center of the visual feature distribution of the corresponding class. In this way, the shift of visual feature distribution is restricted explicitly. In the visual prototype predictor, a feature selection method based on semantic error is also designed to select semantically consistent visual feature dimensions. Meanwhile, unlike other related works, a generative adversarial network was used to model visual residual distribution. Finally, a large number of visual features were synthesized by combining visual prototypes and visual residuals. The distribution of visual features generated by our method is less shifted.

3. It is widely recognized in the field that using unlabeled unseen-class data could alleviate the domain shift problem in ZSL. Existing works usually exploit unlabeled data by self-training of a single model, which could be easily entrapped into a local minima. We proposed a transductive learning framework based on multi-model collaborative training to make use of unlabeled data. In the proposed framework, two (or multiple) base models (which can be any inductive ZSL methods) are used to generate pseudo labels for unseen-class data, and a pseudo-label exchanging module is used to guide the collaborative training among base models. To improve the efficiency of collaborative training, two guidelines were further proposed to choose appropriate base models and three schemes were proposed for pseudo-label exchanging. In addition, in order to generalize the proposed framework to generalized ZSL, a semantic knowledge based domain detection model was proposed for alleviating the bias problem in generalized ZSL.

4. Current zero-shot learning works mainly focus on image classification tasks. Zero-shot learning for 3D scene semantic segmentation is of great application value, but unfortunately still poorly explored in the field. We introduced a new task of zero-shot 3D scene semantic segmentation and established corresponding benchmarks, evaluation protocols, and baselines. To effectively tackle the associated domain shift problem, a semantics-conditioned 3D point feature generative model was proposed and a feature augmentation method based on geometric relationships was designed for model training.

关键词零样本学习 域漂移问题 困难类别现象 特征生成 直推式学习
学科领域计算机科学技术
学科门类工学
语种中文
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/48957
专题毕业生_博士学位论文
推荐引用方式
GB/T 7714
刘博. 零样本学习中的域漂移问题研究[D]. 中国科学院自动化研究所. 中国科学院大学,2022.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
Thesis_signed.pdf(3293KB)学位论文 限制开放CC BY-NC-SA
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[刘博]的文章
百度学术
百度学术中相似的文章
[刘博]的文章
必应学术
必应学术中相似的文章
[刘博]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。