CASIA OpenIR  > 毕业生  > 硕士学位论文
开放集模型自适应方法研究
高正清
Source Publication学位论文
2024-05-26
Pages80
Subtype硕士
Abstract

深度学习技术自其诞生以来,在图像识别、自然语言处理、语音识别等多个领域取得了巨大的成功。这些成就在很大程度上依赖于两个基本假设:封闭世界假设,即所有可能遇到的类别在训练阶段都已知;以及独立同分布假设,即训练和测试数据是从同一分布中采样得到。然而,现实世界的环境远比这些假设所描述的要复杂得多,经常会出现训练时未见过的开放类别,以及由于各种因素引起的数据分布漂移。这些现实问题对深度学习模型的鲁棒性提出了极大的挑战。为此,本文研究开放集模型自适应方法,旨在设计能够有效应对现实环境变化的深度学习模型。本文分别针对单模态视觉模型和多模态视觉-语言模型设计了专门的算法,通过在开放集环境中提高模型的自适应能力,显著增强了深度学习模型面对未知类别和分布漂移时的鲁棒性。本文的主要创新点如下:

1. 本文提出了面向开放集测试阶段自适应的统一熵优化方法。现有的测试阶段自适应方法基本都针对封闭集设计,然而在测试阶段,模型不可避免地会遇到大量在训练阶段未见过的类别,这就要求模型具有妥善处理分布外样本的能力。本文通过实验发现现有的测试阶段自适应方法在开放集设定下会出现性能下降,本文将其总结为对数据分布和模型置信度的不准确估计,并提出了统一熵优化方法。该方法首先对协变量漂移的测试数据进行粗略的区分,将其分为分布内数据和分布外数据,接着对两者分别进行熵最小化和熵最大化,以同时实现已知类别分类和未知类别拒识。此外,本文进一步通过设计样本级权重减少数据划分带来的噪声。实验结果验证了提出方法的有效性。

2. 本文提出了面向视觉-语言模型的开放集测试阶段提示微调方法。近期,视觉-语言模型通过充分探索文本模态的丰富信息在各项视觉任务上取得了优越的性能,展现出了强大的零样本识别能力和开放概念学习潜力。通过在下游数据上进行小样本提示微调,其性能还可以进一步提升。然而,由于模型过拟合到了少量数据,其泛化性能受到了损害。手工设计的提示相比学习到的提示更容易泛化到未知类别。基于此,本文考虑结合二者的优点,提出了一种测试阶段提示融合策略,该策略通过最大概念匹配分数为每个测试样本产生一个动态权重,进而得到依赖于输入的提示。实验结果表明,提出方法在同时考虑基础类和新类时取得了最佳的性能。

Other Abstract

Since its inception, deep learning technologies have achieved great success in multiple fields such as image recognition, natural language processing, speech recognition, etc. These achievements largely rely on two basic assumptions: the closed-world assumption, which posits that all potential categories to be encountered are already known during the training phase; and the assumption of independence and identical distribution, which means the training and test data are sampled from the same distribution. However, the real-world environment is much more complex than these assumptions describe, often presenting open categories not seen during training, and data  drift caused by various factors. However, the complexity of real-world environments far exceeds these theoretical assumptions, frequently introducing open categories unseen during training and data distribution shifts due to various factors. These real-world problems pose significant challenges to the robustness of deep learning models. To this end, this study explores adaptive methods for open-set models, aiming to design deep learning models that can effectively cope with changes in the real environment. We devised specialized algorithms for both unimodal visual models and multimodal vision-language models, markedly improving the models' robustness in confronting unknown categories and distribution shifts within open-set contexts. The main innovations of this paper are as follows:

1. This paper proposes a unified entropy optimization method for open-set test-time adaptation. Existing test-time adaptation methods are basically designed for closed-sets. However, in the test phase, models will inevitably encounter a large number of unseen categories, which requires the model to have the ability to properly handle out-of-distribution samples. This paper finds through experiments that existing test-time adaptation methods will experience performance degradation under an open-set setting, which we summarize as inaccurate estimation of data distribution and model confidence, and propose a unified entropy optimization method. This method first roughly distinguishes the covariate-shift test data into in-distribution data and out-of-distribution data, followed by entropy minimization and entropy maximization respectively, to achieve classification of known categories and rejection of unknown categories simultaneously. Moreover, this paper further reduces the noise brought by hard data partitioning by adopting sample-level weights. Experimental results verify the effectiveness of the proposed method.

2. This paper proposes a open-set test-time prompt tuning method for vision-language models. Recently, vision-language models have shown superior performance in various visual tasks by fully exploring the rich information of the text modality, demonstrating strong zero-shot recognition capability and open concept learning potential. Their performance can be further improved by performing few-shot prompt fine-tuning on downstream data. However, the tendency of models to overfit to limited data compromises their generalizability. Manually designed prompts compared to learned prompts are easier to generalize to unknown categories. Based on this, this paper considers combining the advantages of both and proposes a test-time prompt fusion strategy. This strategy generates a dynamic weight for each test sample through the maximum concept matching (MCM) score, thereby obtaining a prompt dependent on the input. Experimental results show that the proposed method achieves the best performance when considering both base classes and new classes.

Keyword开放集识别 测试阶段自适应 视觉-语言模型
Language中文
Sub direction classification模式识别基础
planning direction of the national heavy laboratory人工智能基础前沿理论
Paper associated data
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/57202
Collection毕业生_硕士学位论文
Recommended Citation
GB/T 7714
高正清. 开放集模型自适应方法研究[D],2024.
Files in This Item:
File Name/Size DocType Version Access License
Thesis.pdf(7633KB)学位论文 限制开放CC BY-NC-SA
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[高正清]'s Articles
Baidu academic
Similar articles in Baidu academic
[高正清]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[高正清]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.