CASIA OpenIR  > 毕业生  > 博士学位论文
图像理解中的数据不均衡学习方法研究
张劭宇
2024-05-19
页数140
学位类型博士
中文摘要

在海量数据的驱动下,基于深度学习的图像理解技术取得了长足的进步,诸如图像识别和目标检测等任务的性能得到了显著的提升。同时,快速增长的图像数据也伴随有极强的复杂性,为数据的处理与应用带来了诸多困难,其中一个关键难点源于数据的不均衡分布特性。在自然界中,物体的类别分布普遍是不均衡的,其中最常见的分布形式是长尾分布,即只有一些频繁出现的头部类别拥有足够多的样本,而大多数尾部类别中仅包含少量样本。在训练数据严重不均衡的场景下,基于深度学习的图像理解模型往往会被头部类数据所主导,难以充分学习尾部类特征,最终导致在尾部类上的性能表现不佳。然而在实际问题中,对于尾部类的识别通常具有非常重要的价值。因此,如何提高图像理解模型在数据不均衡场景下的学习效果,是学术界和工业界共同关注的问题。

图像理解中数据不均衡学习的挑战主要源于数据层面和模型层面。1)在数据层面,数据的类别分布不均衡并且尾部类多样性欠缺,目前缺少简单高效的数据处理方法同时对数据分布以及数据多样性进行改善。2)在模型层面,就模型泛化性而言,当前一些方法在提升模型对于尾部类关注度的同时往往会损害特征学习的泛化性,从而导致模型对其他类的识别性能降低;此外,由于训练集与测试集的类别先验分布往往是不同的,由不均衡分布数据训练得到的模型如何在均衡分布以及各种不同分布测试集上表现出良好的泛化性能,仍然有待进一步研究。更进一步,从基础的图像识别任务拓展到目标检测任务,模型后处理的公平性也面临挑战。由于尾部类目标分数天然偏低,容易在跨目标分数排序竞争中被漏检,然而在目标检测模型的学习中却忽视了对于分数排序公平性的优化。针对以上问题,本文面向数据分布不均衡的场景,分别从数据处理和模型学习两个角度展开研究,主要工作包括:

(1)针对图像识别任务中数据分布不均衡、尾部类多样性欠缺的问题,本文提出了一种基于标签分布均衡的数据混合增广方法。当前基于数据混合的方法主要适用于常规图像识别任务,在数据长尾分布的场景下难以带来稳定的性能提升。本文引入标签出现率的概念描述了这类方法产生的标签抑制问题,并提出通过平衡数据混合过程中标签出现率的分布来缓解头部类对尾部类的抑制。本文提出的方法采用两个独立的类别均衡采样器分别对训练数据进行采样,然后将得到的两批样本按照随机比例进行线性混合生成新数据。该方法在增加尾部类数据多样性的同时缓解了标签抑制问题。实验结果表明该方法能够稳定地提升模型在长尾分布图像数据集上的识别准确率,特别是对于尾部类的准确率。

(2)现有的重加权方法在提高尾部类性能的同时,往往会造成头部类性能的下降。针对该问题,本文提出了一种基于均衡知识蒸馏的不均衡图像识别方法。本文首先从分类器梯度的角度分析了重加权方法的作用机制,指出这类方法会天然地导致学习可泛化特征和促进尾部类学习两个目标之间的矛盾,从而引起头部类识别性能降低。基于这一发现,本文提出均衡知识蒸馏对这两个目标进行解耦。该方法借助一个预训练的教师模型,通过两个损失同时对学生模型进行优化:一个是实例均衡交叉熵损失,充分利用样本多样性,学习具有泛化性的特征表示;另一个是类别均衡知识蒸馏损失,对知识蒸馏损失根据类别先验分布进行加权,从而增强对尾部类的关注。实验结果表明该方法可以在提升尾部类识别准确率的同时较好地保持头部类性能,有效地提升了模型整体的识别效果。

(3)针对图像识别模型从不均衡分布训练集到均衡分布测试集的泛化问题,本文提出了一种基于分布统一与概率空间对齐的不均衡图像识别方法。考虑到训练集和测试集的先验分布不匹配会影响模型在测试集上的表现,本文构造了一种基于概率转换的分布统一训练框架来缓解分布不匹配问题。该框架建立了不均衡分布假设和均衡分布假设下的后验概率转换关系,并通过概率转换对模型训练中的分布假设进行了统一。在此基础上,本文进一步分析了在该框架下应用交叉熵损失导致的概率空间不匹配问题,并构造了一种概率空间对齐的师生学习方法。该方法包含教师引导的标签平滑和分布统一知识蒸馏两部分,二者共同保证了较为对齐的概率空间以执行概率转换。实验结果表明该方法可以有效提升模型从不均衡分布训练集到均衡分布测试集的泛化性能,同时可以灵活地将测试分布拓展为各种不同的数据分布并表现出良好的识别效果。

(4)针对目标检测中尾部类目标因分数排序靠后易被漏检的问题,本文提出了一种兼顾目标级判别和全局级排序的目标检测训练框架。该框架的核心在于训练模型同时对每个目标进行分类以及对所有置信分数进行全局排序,其训练损失函数由两部分组成:目标级判别损失旨在确保模型的判别性,促进对单个目标的正确分类;在此基础上,从全局级分数排序的角度提出广义平均精度损失,优化每一类的跨目标分数排序关系,促进排序的公平性。由于尾部类目标出现频率低,分数排序的优化效果较弱,本文将训练中动态累计的每类样本数量信息引入到广义平均精度损失的计算中,以实现对每个类别均衡的排序优化。实验结果表明该框架可以即插即用地与其他不均衡学习方法结合,缓解不公平排序导致的尾部类漏检问题,提升数据不均衡场景下的目标检测性能。

英文摘要

Under the impetus of massive data, deep learning-based image understanding techniques have made significant progress, leading to remarkable performance improvements in tasks such as image recognition and object detection. However, the rapid expansion of image data is accompanied by its inherent complexity, posing numerous challenges for data processing and application. The imbalanced data distribution is one of the key challenges. In the natural world, the distribution of object categories tends to be imbalanced and generally long-tailed. This means that only a few head categories have a sufficient number of samples, while the majority of tail categories contain only a small number of samples. In scenarios with severely imbalanced training data, deep learning-based image understanding models tend to be dominated by head categories, struggling to adequately learn features of tail categories, which may lead to unsatisfactory performance on tail categories. However, the recognition for tail categories often holds significant value in practical scenarios. Therefore, improving the learning performance of image understanding model in data-imbalanced scenarios has become a shared concern in both academia and industry.

The challenges of data-imbalanced learning in image understanding come mainly from both the data and model perspectives. 1) On the data side, the imbalance in data distribution and the insufficient sample diversity in tail categories both impact the data quality. Currently, there is a lack of simple and effective data processing methods to simultaneously balance the distribution and improve data diversity. 2) On the model side, in terms of model generalization, some existing methods enhance the focus on tail categories, but this may hurt the generalization of representation learning and degrade the performance of other categories. In addition, due to the different prior distributions of the training set and the test set, how models trained on imbalanced data can generalize well on balanced and variously distributed test sets still remains to be solved. Furthermore, extending from image recognition to object detection, the fairness of post-processing also faces challenges. Tail objects are prone to being filtered out in the cross-object score ranking competition due to their lower scores, which may lead to missed detection. However, the optimization of ranking fairness is often overlooked in the learning process of object detection models. In order to solve these issues, this dissertation focuses on the data-imbalanced scenarios and investigates from the perspectives of data processing and model learning. The main contributions of this dissertation are summarized as follows:

(1) To balance the data distribution and improve the diversity of tail categories in image recognition, a data augmentation method based on label distribution balanced mixup is proposed. Current mixing based methods are mainly designed for conventional image recognition task, yet they fail to bring stable improvements in scenarios with long-tailed distributions. This dissertation introduces the concept of label occurrence ratio to demonstrate the issue of label suppression arising from such methods, and propose to alleviate the suppression by balancing the distribution of label occurrence ratio. The proposed method utilizes two independent class balanced samplers to sample from the training set, and then mixes the two batches of samples with random ratios for data augmentation. The proposed method enhances the diversity of tail categories and alleviates the label suppression issue simultaneously. Experimental results demonstrate that the proposed method consistently improves the recognition accuracy on long-tailed image datasets, particularly for the tail categories.

(2) Existing re-weighting based methods often improve the performance on tail categories with the sacrifice of the performance on head categories. To alleviate this dilemma, an imbalanced image recognition method based on balanced knowledge distillation is proposed. This dissertation first analyzes the mechanism of re-weighting methods from the perspective of classifier gradients, pointing out that such methods naturally lead to a contradiction between the two goals of data-imbalanced learning, i.e., learning generalizable representations and facilitating learning for tail categories, thereby degrading recognition performance on head categories. On this basis, this dissertation proposes balanced knowledge distillation to decouple the two goals. By leveraging a pre-trained teacher model, the proposed method optimizes the student model with the combination of an instance-balanced cross-entropy loss and a class-balanced knowledge distillation loss. The former benefits from sample diversity and learns generalizable representations, while the latter weights the distillation loss according to the prior distribution for more emphasis on tail categories. Experimental results demonstrate the effectiveness of the proposed method, which can improve the recognition accuracy of tail categories while maintaining good performance for head categories, and thus enhance the overall recognition performance.

(3) To improve model generalization from imbalanced training set to balanced test set, an imbalanced image recognition method based on distribution unification and probability space alignment is proposed. Considering that the prior distribution mismatch between training and inference can impact the model performance on the test set, this dissertation introduces a distribution unified training framework based on probability conversion to alleviate the distribution mismatch problem. This framework establishes the posterior probability relationship between the imbalanced distribution and the balanced distribution, unifying the distribution assumption during training through probability conversion. Furthermore, this dissertation analyzes the probability space mismatch induced by applying cross-entropy loss in the distribution unified framework. To solve this issue, a probability space aligned teacher-student learning method is introduced, involving teacher guided label smoothing and distribution unified knowledge distillation. These two components jointly ensure a relatively aligned probability space for probability conversion. Experimental results demonstrate that the proposed method effectively improves the generalization performance from an imbalanced training set to a balanced test set. Moreover, the proposed method can easily adapt to various test distributions and exhibits good recognition performance.

(4) To alleviate missed detections of tail categories caused by cross-object score ranking, an object detection training framework that reconciles object-level and global-level objectives is proposed. The framework simultaneously trains the model with two tasks: classifying each object proposal individually and ranking all confidence scores globally. The training loss function consists of two parts. The object-level discrimination loss aims to ensure the discriminative ability of the model and improve classification performance on individual object. Complementarily, a generalized average precision loss is proposed from the perspective of global-level score ranking, aiming to optimize cross-object score ranking for each category and promote the fairness of ranking. Due to the low frequency of tail categories, the ranking optimization for tail categories is relatively weak. Therefore, the sample statistic for each category is dynamically accumulated and injected to the generalized average precision loss, generating balanced gradients to re-rank each category equally. Experimental results demonstrate that the proposed framework can be seamlessly integrated with other imbalanced learning methods, alleviating missed detections of tail categories caused by unfair ranking, and improving the detection performance in data-imbalanced scenarios.

关键词数据不均衡 图像识别 目标检测 数据增广 知识蒸馏
语种中文
七大方向——子方向分类目标检测、跟踪与识别
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/57099
专题毕业生_博士学位论文
推荐引用方式
GB/T 7714
张劭宇. 图像理解中的数据不均衡学习方法研究[D],2024.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
图像理解中的数据不均衡学习方法研究.pd(10207KB)学位论文 限制开放CC BY-NC-SA
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[张劭宇]的文章
百度学术
百度学术中相似的文章
[张劭宇]的文章
必应学术
必应学术中相似的文章
[张劭宇]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。