CASIA OpenIR  > 毕业生  > 硕士学位论文
分布外样本检测与域泛化算法研究
裴森
2023-05-25
页数88
学位类型硕士
中文摘要

随着计算机视觉的快速发展,大量深度学习模型被部署到实际场景中,为人们的生活带来了便利。然而,深度学习模型从实验室走向实际应用的过程并非一帆风顺,研究人员发现,即使是在测试集数据上表现良好的模型,也并不能绝对妥善地处理实际场景中的输入。这是因为,传统的计算机视觉任务通常会假定训练集和测试集满足独立同分布条件,即训练和测试数据来自于同一数据分布的随机采样。尽管独立同分布条件的成立难以检验,但是在构造视觉任务时,通常会避免让测试数据和训练数据的差异太大,从而保证评测的有效性。在独立同分布假设条件下,模型性能的评判标准更多地侧重到了模型在不同任务上对数据的拟合能力(例如,分类的准确率、目标检测的准确率和召回率等),而忽视了模型的泛化性能。对于部署到真实场景下的模型来说,其泛化性恰恰是至关重要的。由此,在实际需求的驱动下,对于“开放世界”设定下的视觉研究大量涌现。

通常来讲,“开放世界”的设定主要体现在两个方面,以图像识别问题为例:其一是类别上的开放,即测试数据集的类别不被限制在预先定义的封闭集合中,而是允许有新的类别出现。具体来讲,传统的图像分类问题要求训练集和验证集(或测试集)的类别是一致的,但是在实际场景中,我们无法有效地约束输入到模型的图像信息。因此,模型必须具备一定的处理“预期之外的样本”的能力。该能力是分布外样本检测研究领域的重点关注内容,这些“预期之外的样本”被定义为“分布外样本”,它们与训练集不存在类别上的重叠;其二是域信息上的开放。关于域信息的定义各有不同,本文将域信息限制为图像风格,例如,卡通风格的钢琴画与真实的钢琴照片,它们属于同一类别的不同域。详细说来,在有些场景下,我们能保证模型的输入一定包含符合期望的语义信息(即输入的类别包含在训练集中),但是这些语义信息的表现形式与训练数据的差异可能会非常大(即图像的风格完全不同)。无论是风格、背景、还是外观等方面的特性都可以被统称为“域”,本文重点关注图像风格的差异。当测试数据的域信息与训练数据有明显差异时,导致传统模型无法进行有效的识别。考虑到系统的安全性和可靠性,部署到实际场景中的深度学习模型应当具有较好的分布外样本检测能力和域泛化能力,同时,这也代表了模型鲁棒性的两个重要研究方向。

根据真实应用场景对深度学习研究的要求,本论文聚焦于提升模型的分布外检测能力和域泛化能力,减少深度学习模型部署到实际生活过程中可能遇到的障碍。本文章的主要内容和贡献点总结如下:

从分布外样本生成的角度提出了两种用于增强模型检测分布外样本能力的方法,分别是边缘分布感知学习方法(boundary aware learning,BAL)和贝叶斯数据增强(BayesAug)方法。前者通过生成对抗网络和单步迭代攻击法(fast gradient sign method,FGSM)产生位于分布内与分布外之间的边界样本数据,模型通过学习这部分困难样本的特征可以更好地辨别输入是否为分布外样本,形成紧密的分布内/外样本分界面;后者通过从原始训练图像中发掘背景信息,并把这些背景信息作为分布外特征来引导模型学习如何判别分布外样本,从而使模型获得拒识能力,即聚焦到分布内样本的典型特征上去,防止背景信息(分布外的目标也是一种背景信息)等因素干扰识别。

从特征学习的角度提出了一种用于增强模型泛化性能(域泛化)的方法,即势能排序方法(potential energy ranking,PoER)。该方法通过构造数据对(pair)之间的势能并对能量按照语义信息的相似程度进行排序,来引导模型学习不同类别和不同域的典型特征。按照此思想,该方案约束深度神经网络在浅层中显式地捕捉类别特征和域特征,以此为基础,在深层网络中将域相关的信息过滤掉,仅保留与类别相关的特征进行后续的分类任务,从而使得模型获取对于域的鲁棒性。此方法的本质思想在于仅生成类别相关的特征,引导模型将注意力放在与任务相关的特征属性上去,避免受到域信息的干扰。

通过详细的对比实验和消融实验展示上述方法的有效性,从数据统计和可视化等多个方面分析每个算法模块的作用,为未来相关的研究提供了实验性参考,丰富了“开放世界”设置下图像识别问题的研究思路。除此之外,本文对算法使用到的数据集以及源代码进行了细致的整理,提供了简洁易用的用户接口,为后续研究人员复现或使用等提供了便利,自觉维护了计算机视觉领域注重开源的良好氛围。

英文摘要

With the rapid advancement of computer vision algorithms, numerous deep learning-based models have been deployed in real-world scenarios, providing great convenience to people's daily lives. Nonetheless, as these models move from laboratory settings to real-world environments, researchers are discovering that models performing well on validation data may not necessarily lead to accurate handling of the inputs from the open-world scenario. This situation is caused by an assumption in the conventional definition of computer vision tasks, which assumes the training and test sets are randomly sampled from the same data distribution, i.e., the i.i.d condition. Although this condition is non-trivial to fully guaranteed or verified, when constructing vision tasks, we typically avoid making the test data too different from the training data, ensuring evaluation reliability. The metrics, under the independently and identically distributed conditions, focus more on the model's performance on different tasks, such as classification accuracy, Precision, and Recall, while ignoring the model's generalization ability. However, in real-world scenarios, the model's generalization ability plays a more vital role in guaranteeing the safety of systems. Hence, research on vision in open-world settings is quickly emerging, driven by practical needs.

The open-world setting is mainly reflected in two aspects, as illustrated by the example of image recognition: firstly, the openness in terms of categories. The traditional image classification problem requires the same and closed categories for the training and validation, but in a real-world scenario, we have no control over the image information fed into the model. Therefore, the model should have the ability to handle such unexpected samples. This kind of openness corresponds to the study of out-of-distribution detection, where the unexpected/unexposed samples are called the out-of-distribution samples; secondly, the openness in terms of domain information. In some scenarios, we can guarantee that the input to the model is purely the desired semantic information, but the variability of this semantic information can be vast. Such characteristics in terms of style, context, and appearance, to name a few, are collectively referred to as domains. Therefore, models deployed for practical usage in real-world scenarios should have good out-of-distribution detection and domain generalization capabilities, promising the security and reliability of the machine learning system.

This paper focuses on enhancing models' out-of-distribution detection and domain generalization capabilities, aiming to reduce the barriers to deploying deep learning models in real-life scenarios. The primary contributions of this paper are summarized as follows:

From the perspective of outlier syntheses, two methods are proposed to enhance the model's ability to detect out-of-distribution samples, namely boundary-aware learning (BAL) and the BayesAug. The former generates hard out-of-distribution samples located on the boundary of the in- and out-distribution domain using generative adversarial networks (GAN) and the fast gradient sign method (FGSM). By learning the features of these hard out-of-distribution samples, the classifier can generate more compact decision boundaries among the in- and out-of-distribution data, and thus, better discriminate whether the input is an out-of-distribution sample or not. The latter guides the model by extracting the pure background information from the original training images and using such background information as the out-of-distribution supervision so that the model can better distinguish the background from the target, i.e., focusing on the typical characters of the in-distribution objects and preventing factors such as background information interfering with the recognition.

To enhance the generalization performance (domain generalization) of the model, we propose the potential energy ranking (PoER), which belongs to the representation learning branch. PoER guides the model to learn typical features of different classes and domains by constructing potentials between data pairs and ranking the energy according to the similarity of their semantic information. Based on this, domain-related information is filtered out in the deeper layers, and only category-related features are retained for subsequent classification tasks, thus enabling the model to acquire robustness with respect to the domains.

The effectiveness of these methods is demonstrated through detailed experiments and ablations, and each algorithm module's effects are analyzed in terms of statistics and visualization, providing an experimental reference for future research and enriching the idea of image recognition in the open world setting. In addition, the datasets and source codes used in this paper are carefully organized. We highly admire the excellent atmosphere of open source in the field of computer vision and manage to maintain it consciously. Therefore, we provide simple and easy-to-use interfaces, helping subsequent researchers to use or reproduce our methods.

关键词分布外样本检测 域泛化 开放世界图像识别
语种中文
是否为代表性论文
七大方向——子方向分类模式识别基础
国重实验室规划方向分类视觉信息处理
是否有论文关联数据集需要存交
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/51889
专题毕业生_硕士学位论文
推荐引用方式
GB/T 7714
裴森. 分布外样本检测与域泛化算法研究[D],2023.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
硕士论文_定稿_裴森.pdf(24852KB)学位论文 限制开放CC BY-NC-SA
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[裴森]的文章
百度学术
百度学术中相似的文章
[裴森]的文章
必应学术
必应学术中相似的文章
[裴森]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。