深度神经网络在噪声标签下的学习行为研究

CASIA OpenIR > 毕业生 > 博士学位论文

	深度神经网络在噪声标签下的学习行为研究
	罗曜儒
	2023-12
页数	134
学位类型	博士
中文摘要	近年来，基于监督学习构造的深度神经网络（Deep Neural Networks, DNNs）被广泛应用于图像分类、语义分割、目标检测等计算机视觉任务并取得了巨大的成功。这些模型强大的性能与泛化能力依赖于大量训练数据以及对应的高精度训练标签，因此为不同任务构建不同高质量的训练数据集变得尤为重要。然而在现实中构建高质量训练数据集存在如下难点：（1）标记成本高昂：需要大量的人力与时间对每个数据进行人工标记；（2）特殊领域专业性强：对于一些如生物、医疗、金融等专有领域，数据集构建过程需要领域专家进行辅助，进一步提升了标注难度；（3）一致性差：由于数据客观上的歧义性和人为主观上的不确定性，导致不同人员在人工标注时存在如误标、漏标或前后标注不一致等行为。这些标注难点共同造成了噪声标签的普遍性，例如可能存在只标注一小部分数据，而其他数据无标签的情况，或专业领域由非专家进行标注导致标签错误的情况等等。随着越来越多的研究证明噪声标签会极大的影响神经网络在训练过程中的学习行为以及泛化性，因此，如何有效地降低DNNs对标记数据的依赖性以及增强模型在噪声标签下的鲁棒性变得至关重要。为了解决以上提到的难点，本研究从图像分类与语义分割两大计算机视觉任务出发，拟解决如下3点关键科学问题：（1）DNNs在噪声标签下与在干净标签下的学习行为是否存在本质区别，如有则区别是什么？（2）DNNs对哪种类别的噪声标签具有较好的鲁棒性？而对哪种类别的噪声标签具有较强的敏感性？（3）如何提升DNNs在噪声标签下的鲁棒性？针对以上三个问题，本文的主要研究内容及贡献如下：在图像分类任务中，通过实验发现即使训练标签是完全随机的，DNNs在训练过程中的损失值最终都能达到或接近于0，意味着模型容量足够完整记忆所有噪声标签。基于此实验现象，通过分解交叉熵损失函数（Cross Entropy Loss）将DNNs在噪声标签下的训练过程拆解为“学习过程”与“记忆过程”，并证明在“学习过程”中，无论训练样本的标签是否准确，DNNs的输出都将拟合至隐藏噪声分布；而在“记忆过程”中，DNNs则出现过拟合现象，且模型输出最终拟合至噪声标签。基于DNNs在图像分类中的学习特性，首先证明目前大部分针对噪声标签设计的鲁棒DNNs所依赖的“小损失假说”（Small-loss Assumption）具有较强的局限性，并成功设计对抗数据集攻击这类DNNs。接着在实践中基于两种噪声分布估计方法来对“小损失假说”进行针对性改进，帮助被攻击的DNNs重新恢复对噪声标签的鲁棒性。针对“语义分割可看作像素点上的分类”这类普遍看法，将噪声标签的研究从图像分类拓展到语义分割，并对该假设进行论证。首先根据不同特性将语义分割的训练标签分为四大类，接着从理论证明DNNs是从标签中的结构信息而非像素点标签学习语义分割。通过将该结构信息定义为元结构（Meta-structure，MS）并借鉴空间点模式分析方法为MS提供数学表示，完成对不同标签语义信息的量化，并证明具有相似MS的标签具有相似的语义信息，而对MS的扰动则会导致模型分割性能的下降。基于DNNs在语义分割中特有的学习特性，提出一种自迭代的无监督二类语义分割模型（Iterative Ground Truth Training, iGTT）。iGTT通过随机初始化训练标签，在利用本论文提出的MS提取模块对模型输出的粗分割结果进行优化后，将预测结果作为下一轮模型训练的伪标签，以此迭代。实验结果表明，相比于最新提出的无监督语义分割模型，iGTT在多个不同二类语义分割数据集上都达到最优的性能，同时极大的缩小了与基于监督学习的DNNs之间的性能差距。需要注意的是，虽然iGTT的训练过程有用到样本标签，但所有标签均为模型自动生成的伪标签，并不需要任何的人为标注过程，因此可以认为本研究提出的iGTT为广义上的无监督学习方法。
英文摘要	In recent years, Deep Neural Networks (DNNs) based on supervised learning have been widely used in computer vision tasks such as image classification, semantic segmentation, and target detection. Although DNNs have achieved great success, their powerful performance and generalization capabilities depend heavily on a large amount of training data and corresponding high-precision training labels, making it particularly important to construct different high-quality training data sets for different tasks. However, in reality, there are the following difficulties in constructing a high-standard training data set: (1) The cost of labeling is high: a lot of time is required to manually label each data; (2) Experts required in specialized fields: for the biology, medical, finance and other proprietary fields, the dataset construction requires domain experts, which further increases the difficulty; (3) Poor consistency: Due to the objective ambiguity of the data or the subjective uncertainty of humans, different personnel manually the same data may have inconsistent labeling. This is the reason why noise labels are common in many tasks. Meanwhile, many recent studies have proved that noisy labels will greatly affect the learning behavior and generalization of DNNs during training. Therefore, how to effectively reduce the dependence of DNNs on labeled data and improve the robustness under noisy labels are crucial problems. To solve these difficulties, this study focuses on two basic tasks of computer vision, i.e., image classification and semantic segmentation, and intends to solve the following three critical challenges: (1) What is the essential difference between the learning behavior of DNNs under noisy labels and under clean labels? (2) Which category of noisy labels are DNNs more robust to? And which category of noise labels has a strong sensitivity? (3) How to improve the robustness of DNNs under noisy labels? Based on these three questions, the main contributions of this paper are as follows: In the image classification task, we find that the training loss of DNNs can reach 0 even when the labels are randomized, which implies that DNNs have the ability to memorize all noisy labels. Based on this, we formulate the training process of DNNs trained with noisy labels into a learning phase and a memorization phase by dividing the cross-entropy loss into two inequalities. We prove that in the learning phase, the output predictions of DNNs converge to the noisy label distributions. And in the memorization phase, the output predictions of DNNs converge to the noisy one-hot labels. Based on the two-phase formulation of the learning process, we prove the limitation of the "small-loss assumption" which is commonly used for designing DNNs under noisy labels and successfully attack these models by synthesizing adversarial datasets. Meanwhile, we propose a specific training strategy based on two noise distribution estimation methods and recover the robustness of the attacked DNNs under noisy labels. Based on the general view that semantic segmentation can be seen as the classification of pixels, our exploration of noise labels changed from image classification to semantic segmentation. We first divide training labels into four categories based on different characteristics. Then, we theoretically prove that DNNs learn semantic segmentation from the structural information hidden in labels rather than pixel-level labels. We define this structural information as meta-structures (MS) and formulate MS by using the spatial point pattern analysis. Furthermore, we quantify the semantics of different labels based on MS. We prove that labels with similar MS have similar semantics and perturb MS will lead to a decrease in segmentation performance. Based on the learning behavior of DNNs in semantic segmentation, we propose a self-iterative unsupervised semantic segmentation model (Iterative Ground Truth Training, iGTT) for binary-class segmentation. The model utilizes the characteristics of DNNs trained with RL and an extraction-of-meta-structure module is proposed to refine MS of the coarsely segmented images. Then the refined images are used for the next epoch of training. Results show that iGTT achieves optimal segmentation performance when compared with the state-of-the-art unsupervised models on multiple binary-class datasets. Meanwhile, iGTT achieves competitive performance when compared with the other supervise models. Note that although training iGTT has used labels, all labels are pseudo-labels and are auto-generated by the model itself, which means no extra manual annotation is needed. Thus the iGTT still can be viewed as an unsupervised method.
关键词	深度神经网络噪声标签鲁棒性小损失假说元结构
语种	中文
七大方向——子方向分类	图像视频处理与分析
国重实验室规划方向分类	人工智能基础前沿理论
是否有论文关联数据集需要存交	否
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/54521
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	罗曜儒. 深度神经网络在噪声标签下的学习行为研究[D],2023.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
自动化所_罗曜儒毕业论文.pdf（50760KB）	学位论文		限制开放	CC BY-NC-SA