图像识别中的领域泛化问题研究

CASIA OpenIR > 毕业生 > 硕士学位论文

	图像识别中的领域泛化问题研究
	Liu Geng
	2023-06-20
页数	76
学位类型	硕士
中文摘要	深度学习已经在计算机视觉和自然语言处理等领域内取得了很大的进展，但是传统的深度学习模型在面对域偏移，即测试数据与训练数据的分布差异较大的情况时，其性能往往会出现严重的下降。针对这个问题，有大量的领域泛化（Domain Generalization, DG）方法被提出，尝试将在多个源域上训练的模型泛化到未知的目标域上。本文针对经典的领域泛化问题以及更加困难且贴近实际的开集领域泛化问题进行了研究，提出了新颖的方法来提升模型的性能，并在多个领域泛化数据集上进行了系统的测试评估。本文的主要研究成果如下： 1. 针对当前领域泛化任务中存在的训练数据的领域多样性不足的问题，本文提出了一种基于大规模视觉语言预训练模型的文本引导的领域泛化方法，通过引入额外的文本信息来增强训练数据的领域多样性，进而提升模型的泛化性。本方法首先设计了一个领域相关词汇生成方法，基于预训练的词汇替换模型来自动生成一定数量的与图像领域相关的词汇，以此来扩展对不同领域的多样化描述。然后本文提出了基于提示学习的文本特征生成方法，利用生成的领域相关词汇进一步生成领域相关文本，通过文本编码器将文本中的领域信息映射到文本特征和图像特征公共的特征空间中，并在此期间通过训练文本提示模板来使文本特征具有更丰富的领域信息。最后，本方法利用输入图像的特征和生成文本的特征来训练一个特别设计的归一化分类器，该分类器在未知的目标域上具有更好的泛化能力，同时图像编码器也基于分类器反向传播回来的梯度进行更新。在多个领域泛化数据集上的实验结果表明，本方法有效利用了生成的文本信息，以一种易于实现的方式在领域泛化任务上取得了优秀的性能表现。 2. 目前已有大量的领域泛化方法被提出来增强模型的泛化性，减少域偏移对模型的影响从而提升模型在未知测试域上的性能。然而传统的领域泛化方法都基于训练数据与测试数据的类别空间一致的假设，这个假设在现实中常常无法成立，因此本文进一步研究了在训练数据与测试数据的类别空间不一致情况下的开集领域泛化问题。本文提出了基于孪生网络的开集领域泛化框架，该框架通过对原始训练图像进行分块打乱来构建合理的未知类别数据，把其作为负样本来不断地对模型进行负面监督，以此让模型学到真正关键的特征表达。这样的做法减少了模型对于原始训练数据的过拟合，有效抑制了模型的过度自信问题，进而增强了模型在开集领域泛化任务上的性能。实验结果显示，该框架在两个开集领域泛化数据集上均取得了目前最佳的性能表现。
英文摘要	Deep learning has made great progress in many fields, such as computer vision and natural language processing. But the performance of traditional deep learning models will be seriously degraded when facing the domain shift, which means that the distribution of test data and training data is significantly different. A large number of Domain Generalization (DG) methods have been proposed to generalize a model trained on multiple source domains to the unseen target domain. The main contributions of this paper are listed as follows: 1. For the lack of domain diversity in the training dataset, we develop a novel Text-guided Domain Generalization (TDG) paradigm based on the large-scale vision-language pretrained model, to improve the domain diversity of the training dataset by introducing extra text information. Specifically, TDG first devises an automatic words generation method to extend the description of current domains with novel domain-relevant words. Then, it embeds the generated domain information into the text feature space, by the proposed prompt learning-based text feature generation method, which shares a common representation space with the image feature. Finally, TDG utilizes both input image features and generated text features to train a specially designed classifier that generalizes well on unseen target domains, while the image encoder is also updated under the supervision of gradients back propagated from the classifier. Our experimental results show that the techniques incorporated by TDG contribute to the performance in an easy implementation manner. Experimental results on several domain generalization benchmarks show that our proposed framework achieves superior performance by effectively leveraging generated text information in domain generalization. 2. A large number of domain generalization methods have been proposed to enhance the generalizability of models, so as to reduce the impact of domain shift on models and improve the performance of models on unknown target domains. However, traditional domain generalization methods are based on the assumption that the category space of training data and test data is consistent, which is always untenable in practice. Therefore, this paper further studies the open-set domain generalization problem when the category spaces of training data and test data are inconsistent. This paper proposes an open-set domain generalization framework based on the Siamese network, which generates images in unknown categories through patch-shuffling, and treats generated images as negative samples to negatively supervise models. Thus models are forced to learn the critical feature representations, the over-fitting of models reduces, and then the performance of models on open-set domain generalization tasks is enhanced. The experimental results show that the proposed framework achieves state-of-the-art on the two open-set domain generalization benchmarks.
关键词	深度学习图像识别领域泛化开集识别
学科领域	模式识别
学科门类	工学 ; 工学::计算机科学与技术（可授工学、理学学位）
语种	中文
是否为代表性论文	是
七大方向——子方向分类	类脑模型与计算
国重实验室规划方向分类	脑启发多模态智能模型与算法
是否有论文关联数据集需要存交	否
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/52317
专题	毕业生_硕士学位论文
推荐引用方式 GB/T 7714	Liu Geng. 图像识别中的领域泛化问题研究[D],2023.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
ucasthesis.pdf（6822KB）	学位论文		限制开放	CC BY-NC-SA