CASIA OpenIR  > 毕业生  > 博士学位论文
域偏移场景下的视觉识别方法研究
李经纬
2024-05-16
Pages136
Subtype博士
Abstract

视觉识别是计算机视觉中最基础的任务之一,得益于海量独立同分布的高质量有标签数据,其相关技术已得到广泛应用。然而,在现实场景中,训练(源)数据和测试(目标)数据往往难以满足独立同分布的假设,从而产生了域偏移现象。在此基础上,由于数据存在隐私保护或实时变化,其获取和标注成本进一步增加。这导致使用传统视觉识别方法训练出的模型性能大幅降低,严重限制了其应用范围。为了应对上述挑战,本文借助迁移学习的思想,研究跨域视觉识别方法,旨在从源域数据和模型中提取丰富的知识,以支持目标域数据的准确识别。针对域偏移场景中多种具体条件下跨域视觉识别方法存在的细粒度分布不匹配、模型可解释性不足、训练过程不稳定、误差累积难消除等问题,提出相应的理论和方法,以提高其在对应条件下的性能。本文的主要研究内容和创新成果归纳如下:

(1)针对目标域数据可用条件下细粒度分布不匹配的问题,本文提出了基于跨注意力图正则的无监督域自适应方法。该方法受小样本学习的启发,将小样本学习中样本级交互的思想引入无监督域自适应中;提出高置信度样本选择与类级匹配来桥接两种任务,使得无监督域自适应在训练过程也能完成样本级交互;提出一个即插即用的无参数跨注意力图生成模块完成上述交互过程;提出跨注意力图正则项来约束跨注意力图,构建鼓励正确匹配和抑制错误匹配的约束函数,从而弥合源域和目标域之间的差异。本方法实现了目标域数据可用条件下更细粒度的域匹配,减少了错误的知识迁移,提高了域对齐的准确性。

(2)针对目标域未知条件下模型可解释性不足的问题,本文提出了基于频域双分支增强模块的域泛化方法。该方法基于傅里叶变换,同时考虑幅度谱分量和相位谱分量在域泛化中的作用;对于幅度谱分支,提出多级幅度谱修正与标定,分别在域内级和域间级提出域内幅度谱分布修正和域间幅度谱狄利克雷混合,以减轻域特定信息的影响,并探索更多的特征空间;提出测试时间幅度谱原型标定,以在评估时进一步缓解源域和目标域之间的域差异;对于相位谱分支,提出随机对称相位谱摄动来增强识别域无关信息的鲁棒性。本方法结合两个分支的贡献,提高了在目标域未知条件下模型的可解释性和在任意目标域的泛化能力。

(3)针对源域标签稀疏条件下训练过程不稳定的问题,本文提出了结合主动学习的半监督域泛化统一框架。通过分析半监督学习和主动学习之间的本质共性,提出了主动半监督域泛化的新任务,并为半监督域泛化和主动学习构建了基于梯度相似度的样本过滤与排序框架;在半监督域泛化部分,通过两个样本过滤模块选择可靠的无标签源域样本赋予伪标签;在主动学习部分,通过两个样本排序模块选择少量信息丰富的无标签源域样本赋予真实标签。这两个部分通过预测置信度和梯度相似度桥接并进行迭代式训练,只需很少的额外标注成本即可大幅提升在源域标签稀疏条件下模型的性能以及模型的训练效率和稳定性。

(4)针对无源且目标域持续变化条件下误差累积难消除的问题,本文提出了弹性测试时间熵最小化的测试时间自适应方法。由于不同的目标域的分布特性和对模型的影响不同,因此它们不应该被平等对待;该方法基于平均余弦相似度来衡量目标域和模型的变化;提出平均余弦相似度的弹性,并基于该性质提出了弹性测试时间熵最小化,通过构建的平均余弦相似度和动量系数之间的函数关系自适应地更新和恢复模型;提出三点改进方案,使模型能应用于多个实际场景中,减轻了在无源且目标域持续变化条件下模型的误差累积和灾难性遗忘。

Other Abstract

Visual recognition is one of the most fundamental tasks in computer vision. Thanks to the massive independent and identically distributed (i.i.d.) high-quality labeled data, its related technologies have been widely applied. However, the training (source) data and the test (target) data often fail to meet the i.i.d assumption in real-world scenarios, leading to the phenomenon of domain shift. On this basis, due to privacy protection or real-time changes in data, the cost of data acquisition and annotation further increases. This results in a significant decrease in the performance of models trained with conventional visual recognition methods, which severely limits their scope of application. To address the aforementioned challenges, the dissertation gets inspiration from transfer learning to study cross-domain visual recognition methods, aiming to extract rich knowledge from source domain data and models to support the accurate recognition of target domain data. Aiming at the issues such as mismatch of fine-grained distribution, insufficiency of model interpretability, instability of training process, and difficulty in eliminating error accumulation in cross-domain visual recognition methods under various specific conditions in domain shift scenarios, corresponding theories and methods are proposed to improve their performance under the corresponding conditions. The main research contents and innovative achievements of this dissertation are summarized as follows:

(1) Aiming at the issue of the mismatch of fine-grained distribution under the condition of the availability of data of the target domain, the dissertation proposes an unsupervised domain adaptation method based on Cross-Attention Map Regularization (CAMR). This method is inspired by few-shot learning and introduces the idea of sample-level interaction of few-shot learning into unsupervised domain adaptation. High-confidence Sample Selection and Class-level Matching (HSSCM) is proposed to bridge the two tasks, making sample-level interaction can also be completed during the training process of unsupervised domain adaptation. A plug-and-play and parameter-free Cross-Attention Map Generation Module (CAMGM) is proposed to complete the aforementioned interaction process. A CAMR term is proposed to constrain cross-attention maps, and a constraint function that encourages correct matching and suppresses incorrect matching are constructed, thereby bridging the differences between source and target domains. The method achieves finer-grained domain matching under the condition of the availability of data of the target domain, reduces erroneous knowledge transfer, and improves the accuracy of domain alignment.

(2) Aiming at the issue of the insufficiency of model interpretability under the condition of the unknown of the target domain, the dissertation proposes a domain generalization method based on a Frequency domain Dual Branch Augmentation Module (FDBAM). This method is based on Fourier transform and considers the roles of both the amplitude spectrum components and the phase spectrum components in domain generalization. For the amplitude spectrum branch, multi-level amplitude spectrum correction and calibration is proposed. Inner-domain Amplitude Distribution Rectification (IADR) for inner-domain level and Cross-domain Amplitude Dirichlet Mixup (CADM) for cross-domain level are proposed, respectively to mitigate the impact of domain-specific information and explore more feature spaces. Test-time Amplitude Prototype Calibration (TAPC) is proposed to further mitigate the discrepancies between source and target domains during evaluation. For the phase spectrum branch, Random Symmetric Phase Perturbation (RSPP) is proposed to enhance the robustness of identifying domain-independent information. The method combines the contributions of the two branches to improve the interpretability and the generalization ability in any target domain of the model under the condition of the unknown of the target domain.

(3) Aiming at the issue of the instability of the training process under the condition of the sparsity of labels of the source domain, the dissertation proposes a unified framework for semi-supervised domain generalization integrated with active learning. By analyzing the essential commonalities between semi-supervised learning and active learning, a new task called active semi-supervised domain generalization is proposed. A framework called Gradient-Similarity-based Sample Filtering and Sorting (GSSFS) is constructed for semi-supervised domain generalization and active learning. In the semi-supervised domain generalization part, reliable unlabeled source domain samples are selected and assigned pseudo labels through two sample filtering modules. In the active learning part, a small number of informative unlabeled source domain samples are selected and assigned true labels through two sample sorting modules. These two parts are bridged through prediction confidence and gradient similarity and trained iteratively, which can greatly improve the performance, training efficiency, and stability of the model with only a small additional annotation cost under the condition of the sparsity of labels of the source domain.

(4) Aiming at the issue of the difficulty in eliminating error accumulation under the condition of being source-free and continuous change of the target domains, the dissertation proposes a test-time adaptation method called Elastic-Test-time Entropy Minimization (E-TENT). Due to the varying distribution characteristics of different target domains and their distinct impacts on the model, they should not be treated equally. The method measures the changes in the target domains and the model based on the mean cosine similarity. The elasticity of the mean cosine similarity is proposed, and E-TENT is proposed based on this property. The model is updated and restored adaptively by the established functional relationship between the mean cosine similarity and momentum coefficient. Three improvements are proposed to enable the model to be applied to multiple real-world scenarios, reducing the error accumulation and catastrophic forgetting of the model under the condition of being source-free and continuous change of the target domains.

Keyword迁移学习 视觉识别 无监督域自适应 域泛化 测试时间自适应
Language中文
Sub direction classification机器学习
planning direction of the national heavy laboratory虚实融合与迁移学习
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/56678
Collection毕业生_博士学位论文
Recommended Citation
GB/T 7714
李经纬. 域偏移场景下的视觉识别方法研究[D],2024.
Files in This Item:
File Name/Size DocType Version Access License
Thesis.pdf(12161KB)学位论文 限制开放CC BY-NC-SA
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[李经纬]'s Articles
Baidu academic
Similar articles in Baidu academic
[李经纬]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[李经纬]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.