CASIA OpenIR  > 毕业生  > 博士学位论文

    近年来,基于监督学习构造的深度神经网络(Deep Neural Networks, DNNs)被广泛应用于图像分类、语义分割、目标检测等计算机视觉任务并取得了巨大的成功。这些模型强大的性能与泛化能力依赖于大量训练数据以及对应的高精度训练标签,因此为不同任务构建不同高质量的训练数据集变得尤为重要。



    在图像分类任务中,通过实验发现即使训练标签是完全随机的,DNNs在训练过程中的损失值最终都能达到或接近于0,意味着模型容量足够完整记忆所有噪声标签。基于此实验现象,通过分解交叉熵损失函数(Cross Entropy Loss)将DNNs在噪声标签下的训练过程拆解为“学习过程”与“记忆过程”,并证明在“学习过程”中,无论训练样本的标签是否准确,DNNs的输出都将拟合至隐藏噪声分布;而在“记忆过程”中,DNNs则出现过拟合现象,且模型输出最终拟合至噪声标签。

    基于DNNs在图像分类中的学习特性,首先证明目前大部分针对噪声标签设计的鲁棒DNNs所依赖的“小损失假说”(Small-loss Assumption)具有较强的局限性,并成功设计对抗数据集攻击这类DNNs。接着在实践中基于两种噪声分布估计方法来对“小损失假说”进行针对性改进,帮助被攻击的DNNs重新恢复对噪声标签的鲁棒性。


    基于DNNs在语义分割中特有的学习特性,提出一种自迭代的无监督二类语义分割模型(Iterative Ground Truth Training, iGTT)。iGTT通过随机初始化训练标签,在利用本论文提出的MS提取模块对模型输出的粗分割结果进行优化后,将预测结果作为下一轮模型训练的伪标签,以此迭代。实验结果表明,相比于最新提出的无监督语义分割模型,iGTT在多个不同二类语义分割数据集上都达到最优的性能,同时极大的缩小了与基于监督学习的DNNs之间的性能差距。需要注意的是,虽然iGTT的训练过程有用到样本标签,但所有标签均为模型自动生成的伪标签,并不需要任何的人为标注过程,因此可以认为本研究提出的iGTT为广义上的无监督学习方法。


    In recent years, Deep Neural Networks (DNNs) based on supervised learning have been widely used in computer vision tasks such as image classification, semantic segmentation, and target detection. Although DNNs have achieved great success, their powerful performance and generalization capabilities depend heavily on a large amount of training data and corresponding high-precision training labels, making it particularly important to construct different high-quality training data sets for different tasks.

    However, in reality, there are the following difficulties in constructing a high-standard training data set: (1) The cost of labeling is high: a lot of time is required to manually label each data; (2) Experts required in specialized fields: for the biology, medical, finance and other proprietary fields, the dataset construction requires domain experts, which further increases the difficulty; (3) Poor consistency: Due to the objective ambiguity of the data or the subjective uncertainty of humans, different personnel manually the same data may have inconsistent labeling. This is the reason why noise labels are common in many tasks. Meanwhile, many recent studies have proved that noisy labels will greatly affect the learning behavior and generalization of DNNs during training. Therefore, how to effectively reduce the dependence of DNNs on labeled data and improve the robustness under noisy labels are crucial problems.

    To solve these difficulties, this study focuses on two basic tasks of computer vision, i.e., image classification and semantic segmentation, and intends to solve the following three critical challenges: (1) What is the essential difference between the learning behavior of DNNs under noisy labels and under clean labels? (2) Which category of noisy labels are DNNs more robust to? And which category of noise labels has a strong sensitivity? (3) How to improve the robustness of DNNs under noisy labels? Based on these three questions, the main contributions of this paper are as follows:

    In the image classification task, we find that the training loss of DNNs can reach 0 even when the labels are randomized, which implies that DNNs have the ability to memorize all noisy labels. Based on this, we formulate the training process of DNNs trained with noisy labels into a learning phase and a memorization phase by dividing the cross-entropy loss into two inequalities. We prove that in the learning phase, the output predictions of DNNs converge to the noisy label distributions. And in the memorization phase, the output predictions of DNNs converge to the noisy one-hot labels.

    Based on the two-phase formulation of the learning process, we prove the limitation of the "small-loss assumption" which is commonly used for designing DNNs under noisy labels and successfully attack these models by synthesizing adversarial datasets. Meanwhile, we propose a specific training strategy based on two noise distribution estimation methods and recover the robustness of the attacked DNNs under noisy labels.

    Based on the general view that semantic segmentation can be seen as the classification of pixels, our exploration of noise labels changed from image classification to semantic segmentation. We first divide training labels into four categories based on different characteristics. Then, we theoretically prove that DNNs learn semantic segmentation from the structural information hidden in labels rather than pixel-level labels. We define this structural information as meta-structures (MS) and formulate MS by using the spatial point pattern analysis. Furthermore, we quantify the semantics of different labels based on MS. We prove that labels with similar MS have similar semantics and perturb MS will lead to a decrease in segmentation performance.

    Based on the learning behavior of DNNs in semantic segmentation, we propose a self-iterative unsupervised semantic segmentation model (Iterative Ground Truth Training, iGTT) for binary-class segmentation. The model utilizes the characteristics of DNNs trained with RL and an extraction-of-meta-structure module is proposed to refine MS of the coarsely segmented images. Then the refined images are used for the next epoch of training. Results show that iGTT achieves optimal segmentation performance when compared with the state-of-the-art unsupervised models on multiple binary-class datasets. Meanwhile, iGTT achieves competitive performance when compared with the other supervise models. Note that although training iGTT has used labels, all labels are pseudo-labels and are auto-generated by the model itself, which means no extra manual annotation is needed. Thus the iGTT still can be viewed as an unsupervised method.

关键词深度神经网络 噪声标签 鲁棒性 小损失假说 元结构
GB/T 7714
罗曜儒. 深度神经网络在噪声标签下的学习行为研究[D],2023.
文件名称/大小 文献类型 版本类型 开放类型 使用许可
自动化所_罗曜儒毕业论文.pdf(50760KB)学位论文 限制开放CC BY-NC-SA
所有评论 (0)
