标签稀缺条件下的视觉模型可迁移性研究 | |
许逸凡![]() | |
2022-05-16 | |
Pages | 60 |
Subtype | 硕士 |
Abstract | 随着人工智能的蓬勃发展,以及深度学习技术的兴起,计算机视觉领域在近年来取得了巨大的成功。目前诸多的成功离不开大数据的发展和深度神经网络强大的拟合能力。然而,很多真实场景常常伴随着以下两点挑战:(1)数据标签的稀缺性:获取大量人为标注的样本标签是十分费时费力的;(2)测试环境的迁移性:一个训练好的视觉模型在投入实际应用时往往会遇到和训练环境不一样的测试环境。鉴于无标注的原始数据一般较容易获取,那么在获取大量无标签样本后令视觉模型学习到更强的可迁移性成为了一个值得研究的课题。本文旨在探索标签稀缺条件下的视觉模型的可迁移性,即利用有标签数据和一定无标签数据样本来学习能够适应多变测试环境的视觉模型。 针对可迁移性问题,本文认为学习算法和模型结构是相辅相成的,并从这两个角度出发展开探索。在学习算法方面,本文进一步拓展了传统领域自适应迁移任务,提出污染鲁棒领域自适应任务,使得模型不仅具备跨域可迁移性,还对训练时未曾见过的测试污染鲁棒。在模型结构方面,本文探索了基于纯自注意力机制的非卷积模型结构视觉Transformer(ViT)的可迁移性。本文贡献包括: (1)研究真实场景的污染鲁棒领域自适应算法问题。在传统的领域自适应任务基础上提出了污染鲁棒领域自适应任务。设计了一种基于域差异信息的无监督增广样本方式,在大幅度提升模型的跨域污染鲁棒性的同时进一步提升模型在原始领域自适应任务下的迁移能力。 (2)研究基于纯自注意力机制的非卷积模型结构视觉Transformer(ViT)的可迁移性问题。从理论和实验层面分析了ViT结构相比卷积神经网络在可迁移性上的优势及其原因。 (3)研究ViT结构的轻量化可迁移方法。针对传统ViT结构存在大量冗余计算、难以适用于密集预测任务的问题,本文利用ViT本身动态关联性计算的特性,设计了基于全局类注意力的动态稀疏标识更新策略,在轻量化ViT结构的同时可以保持原始的可迁移性能。 |
Other Abstract | With the rapid development of artificial intelligence and deep learning technology, huge successes have been achieved in computer vision in recent years. These successes are inseparable from the development of big data and the strong fitting ability of deep neural networks. However, there always exist two challenges in realistic scenario. (1) Scarcity of data labels: it is always time-consuming to obtain labeled data for supervised learning. (2) Transferability of testing environment: a well-trained vision model always faces the testing environment different from the one during training. Since it is more easier to obtain unlabeled original data, how to learn a transferable vision model with many unlabeled data remains a topic worth deeper study. Therefore, this thesis investigates the transferability of vision model under label-scarce scenario. Namely, to learn a vision model that is able to adapt to changeable testing environment based on some labeled and unlabeled data. The learning algorithms and model architectures benefit from each other. Thus, this thesis investigates the transferability issue from both learning algorithms and model architectures. With respect to learning algorithms, we propose corruption robust domain adaptation to extend the traditional transfer learning task domain adaptation. With respect to model architectures, we investigate the transferability of the pure self-attention-based architecture vision transformer (ViT). In summary, the main contributions of this thesis are as follows. (1) We investigate the corruption robust domain adaption algorithm for realistic scenarios. An unsupervised data augmenting algorithm based on domain discrepancy information is designed to enhance the corruption robustness of vision models while maintaining their transferability on the original domain adaptation task. (2) We investigate the transferability of pure self-attention-based architectures vision transformers (ViTs). We conclude the advantages of ViTs over convolutional neural networks and analyse the reasons from both theoretical and experimental aspects. (3) We investigate the light weight design of ViTs for transferability. Due to the huge redundancy in the computation, ViT models are not suitable to dense prediction vision tasks. We utilize the dynamic correlation calculation of ViTs to design a slow-fast token evolution strategy based on class attention. The proposed strategy can accelerate vanilla vision transformers while maintaining their original transferability. |
Keyword | 可迁移性 半监督学习 领域自适应 自注意力机制 |
Language | 中文 |
Document Type | 学位论文 |
Identifier | http://ir.ia.ac.cn/handle/173211/48504 |
Collection | 毕业生_硕士学位论文 |
Corresponding Author | 许逸凡 |
Recommended Citation GB/T 7714 | 许逸凡. 标签稀缺条件下的视觉模型可迁移性研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2022. |
Files in This Item: | ||||||
File Name/Size | DocType | Version | Access | License | ||
许逸凡_硕士毕业论文.pdf(3427KB) | 学位论文 | 限制开放 | CC BY-NC-SA |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment