面向图像识别与物体检测的连续学习研究 | |
崔波![]() | |
2022-06 | |
Pages | 120 |
Subtype | 博士 |
Abstract | 随着深度学习的发展,人工神经网络在计算机视觉领域展现出了强大的性能优势。这种优势一方面来源于深层模型的建模能力,另一方面也依赖于大量的训练数据。借助于大数据,深度神经网络通过监督学习和自监督学习取得了巨大的成功。尽管如此,深度神经网络还有很多尚未攻克的难题。连续学习就是其中一个重要的挑战。能够连续学习的模型需要序贯学习新的任务,同时保持在旧任务上的性能表现。在最严格的设定下,学习新任务时,模型是不能接触到旧任务的训练数据的。在这种情况下,模型对旧任务的灾难性遗忘就是一个亟需解决的问题。 由于连续学习具有重要的理论价值和广阔的应用前景,它已经成为了当前的研究热点。已有的工作大部分都在隐式地解决连续学习问题,即通过参数约束或知识蒸馏等方式减轻对旧任务的遗忘。一些显式的方法包括参数分离和样本重放。参数分离为每一个新任务分配一定的新参数,这会导致模型的无限扩张。样本重放会为每一个旧任务保留一定量的旧训练样本,同样带来额外的存储消耗。鉴于已有方法的缺点,本文面向图像识别和物体检测提出一个显式的基于生成重放和知识迁移的统一框架来实现连续学习。图像识别和物体检测分别聚焦于图像的全局内容理解和局部内容识别,是计算机视觉领域关系紧密的两项基本任务。本文的研究内容具体包括:1.基于生成判别协作模型的类增量图像识别。研究者们已经提出了生成重放的方法来克服灾难性遗忘的问题。该方法可以在学习新类的同时生成属于已学习类的样本同步参与学习。然而,这种生成模型通常会在学习过程中面临生成样本和原始样本之间逐渐加剧的分布不匹配问题。因此,本文提出了深度生成和判别模型的协作框架 DeepCollaboration (D-Collab),来有效地解决这个问题。本文提出了一个判别模型来增量式更新特征空间以进行连续分类;同时引入了一个生成模型,利用判别模型产生的特征分布来代替隐变量实现条件生成。生成模型和判别模型通过双向训练连接,以形成特征域和图像域之间映射的循环一致性。此外,该框架还包括一个域对齐模块,用于缓解生成图像和真实图像的特征分布之间的差异。该模块与生成和判别模型可以进一步协作来完成有效的样本挖掘,从而能够筛选合适的生成样本参与新阶段的训练。在面向图像识别的连续学习数据集上的实验表明,该系统显著优于基线生成重放方法,取得了和样本重放方法相当的性能,且不带来额外的存储开销。
2.基于特征重放和知识迁移的类增量物体检测。尽管基于 DNN 的物体检测方法取得了长足的进步,但类增量物体检测 (CIOD) 仍然是一个挑战。在没有旧类的训练样本时,神经网络倾向于只检测新类别的实例。本文中提出了重放和迁移网络(RT-Net)来解决这个问题。本文提出了一个新的生成重放模型,使用存储的旧类别特征分布在学习新类的过程中为 RoI(感兴趣区域)头网络重放旧类的特征。为了克服 RoI 特征空间的剧烈变化,本文提出了引导特征蒸馏和特征转换,以促进从旧模型到新模型的知识迁移。此外,本文提出了联合置信度排序迁移,将候选区域的排序顺序转移到新模型,以使候选区域网络能够对旧类别生成高质量的候选框。该框架为 CIOD 提供了通用解决方案,可以成功应用于两种任务设置:有重叠(新训练集图像会有旧类别实例存在)以及无重叠(新训练集图像中无旧类别实例)。在包括 PASCAL VOC 和 COCO 在内的标准数据集上进行的大量实验表明,RT-Net 可以取得业界领先的性能表现。3.基于平衡增量排序的类增量物体检测。目前连续学习问题已经得到了研究者的关注。尽管面向图像识别的连续学习算法为数不少,研究类增量物体检测(CIOD)的工作却不多见。现有的方法大多依赖于知识蒸馏来实现类增量物体检测,本质是隐式地实现了新旧类别之间的性能折中。本文将类别增量物体检测归纳为对每一个新/旧类别进行候选区域的全局分类置信度排序学习,提出了平衡增量排序(BRS),以解决 CIOD 的灾难性遗忘和数据不均衡问题。具体来说,在一个统一的框架中,提出了基于伪真值进行排序(RSP)和排序迁移 (RST)以在学习新类的同时保留从旧模型中学到的知识,并提出了有效的伪标签计算和正负样本划分方法。为了缓解数据不均衡问题,该方法在训练过程对特定样本对进行梯度平衡。本文通过在 PASCAL VOC 和 COCO 数据集上的广泛实验证明了该方法的有效性。综上,本文面向图像识别和物体检测,系统研究了连续学习中的生成重放和知识迁移策略中的数个关键技术,针对生成样本分布偏差、新旧任务数据不均衡、旧任务知识迁移困难等问题提出了创新的解决方案,极大地缓解了灾难性遗忘。通过深入系统的实验,本文提出的方法在标准数据集上取得了良好的效果,与已有方法相比具有明显优势,为无需存储旧样本的类增量图像识别,以及适用于多种任务场景及多种检测器架构的类增量物体检测提供了有效方案。 |
Other Abstract | With the development of deep learning, artificial neural networks have shown strong performance advantages in the field of computer vision. This advantage comes from the modeling ability of deep models on the one hand, and also relies on a large amount of training data on the other. With the help of big data, deep neural networks have achieved great success through supervised learning and self-supervised learning. Nonetheless, deep neural networks still have many unsolved challenges. Continual learning is one of the important challenges. A model that is capable of continual learning needs to sequentially learn new tasks while maintaining its performance on old tasks. In the strictest case, when learning a new task, the model does not have access to the training data of the old task. In this case, the catastrophic forgetting of old tasks by the model is an urgent problem to be solved. Because continual learning has important theoretical value and broad application prospects, it has become a hot research topic recently. Most of the existing works address the continual learning problem implicitly, that is, to alleviate the forgetting of old tasks by means of parameter constraints or knowledge distillation. Some explicit methods include parameter separation and sample replay. Parameter separation assigns certain new parameters to each new task, which leads to infinite expansion of the model. Sample replay will reserve a certain amount of old training samples for each old task, which also brings additional storage consumption. To overcome the shortcomings of existing methods, an explicit unified framework based on generative replay and knowledge transfer, is proposed to achieve continual learning for image recognition and object detection. Image recognition and object detection focus on the global content understanding and local content recognition of images respectively, which are two basic tasks closely related in the field of computer vision. The contributions of this paper are as follows: 1. Collaborative generative and discriminative models for class incremental learning. Researchers have proposed generative replay methods to overcome the problem of catastrophic forgetting. This method can generate samples belonging to the learned classes while learning new classes and participate in the learning synchronously. However, such generative models usually suffer from increased distribution mismatch between the generated and original samples along the learning process. In this work, we propose DeepCollaboration (D-Collab), a collaborative framework of deep generative and discriminative models to solve this problem effectively. We develop a discriminative learning model to incrementally update the latent feature space for continual classification. At the same time, a generative model is introduced, which uses the feature distribution generated by the discriminant model to replace the latent variables to achieve conditional generation. The generative model and the discriminative model are connected by bidirectional training to form cyclic consistency of the mapping between the feature domain and the image domain. In addition, the framework includes a domain alignment module to eliminate differences between the feature distributions of generated and real images. This module can further collaborate with the generation and discriminative models to complete effective sample mining, so that suitable generated samples can be selected to participate in the new stage of training. Experiments on continual learning benchmarks for image recognition show that the system significantly outperforms baseline generative replay methods, achieving comparable performance to sample replay methods without incurring additional storage overhead. 2. Replay-and-transfer network for class incremental object detection. Despite the remarkable performance achieved by DNN-based object detectors, class incremental object detection (CIOD) remains a challenge, in which the network has to learn to detect novel classes sequentially. Catastrophic forgetting is the main problem underlying this difficulty, as neural networks tend to detect new classes only when training samples for old classes are absent. In this paper, we propose the Replay-and-Transfer Network (RT-Net) to address this issue and accomplish CIOD. We develop a generative replay model to replay features of old classes during learning of new ones for the RoI (Region of Interest) head, using the stored latent feature distributions. To overcome the drastic changes of the RoI feature space, guided feature distillation and feature translation are introduced to facilitate knowledge transfer from the old model to the new one. In addition, we propose holistic ranking transfer, which transfers ranking orders of proposals to the new model, to enable the region proposal network to identify high quality proposals for old classes. Importantly, this framework provides a general solution for CIOD, which can be successfully applied to two task settings: set-overlapped, in which the old and new training sets are overlapped, and set-disjoint, in which the old and new tasks have unique samples. Extensive experiments on standard benchmark datasets including PASCAL VOC and COCO show that RT-Net can achieve state-of-the-art performance for CIOD. 3. Balanced ranking and sorting for class incremental object detection. Class incremental learning has drawn much attention recently. Although many algorithms have been proposed for class incremental image classification, developing object detectors which can learn incrementally is still a challenge. Existing methods rely on knowledge distillation to achieve class incremental object detection (CIOD), which in essence implicitly achieves a performance trade-off between old and new classes. In this paper, the class incremental object detection is summarized as the classification confidence ranking learning of candidate boxes for each new/old class. We propose balanced ranking and sorting (BRS), to tackle the catastrophic forgetting and data imbalance problems for CIOD. Specifically, ranking \& sorting with pseudo ground truths (RSP) and ranking \& sorting transfer (RST) are developed to preserve the learned knowledge from the old model while learning new classes, in an unified framework. Pseudo ground truth labeling and positive and negative sample division methods are also proposed. To mitigate the data imbalance problem, gradient rebalancing is performed with specific sample pairs during the training process. We demonstrate the effectiveness of our approach with extensive experiments on PASCAL VOC and COCO datasets. In summary, for image recognition and object detection, this paper systematically studies several key technologies in the generative replay and knowledge transfer strategies in continual learning, aiming at the deviation of the generated sample distribution, the imbalance of the amount of new and old data, and the difficulty of knowledge transfer of old tasks, etc. This paper proposes innovative solutions that greatly alleviate catastrophic forgetting in the two critical tasks. Through in-depth and systematic experiments, the proposed methods have achieved state-of-the-art performance on standard benchmarks, with obvious advantages compared with existing methods, and thus prove to be effective solutions for class incremental image recognition without storing old samples, and class incremental object detection suitable for multiple scenarios and multiple architectures. |
Keyword | 连续学习 深度学习 图像识别 物体检测 |
Language | 中文 |
Document Type | 学位论文 |
Identifier | http://ir.ia.ac.cn/handle/173211/48937 |
Collection | 毕业生_博士学位论文 |
Recommended Citation GB/T 7714 | 崔波. 面向图像识别与物体检测的连续学习研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2022. |
Files in This Item: | ||||||
File Name/Size | DocType | Version | Access | License | ||
Thesis_cb_signed.pdf(25344KB) | 学位论文 | 暂不开放 | CC BY-NC-SA |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment