面向数据容错的鲁棒模式识别

CASIA OpenIR > 毕业生 > 硕士学位论文

	面向数据容错的鲁棒模式识别
	李修川
	2023-05
页数	82
学位类型	硕士
中文摘要	近年来，深度学习在模式识别领域取得了突破性的进展，然而其巨大的成功严重依赖于高质量的数据，数据中噪声会严重损害其性能。例如，在训练阶段，大规模的训练数据中会不可避免地存在着标签噪声，深度学习模型由于其强大的拟合能力可以完美地记忆标签噪声，最终导致泛化性能明显变差；在推理阶段，对抗攻击可以通过在图像上添加人眼无法察觉的对抗扰动来制造对抗样本，令深度学习模型做出错误的决策。为此，针对训练阶段的标签噪声以及推理阶段的对抗样本，本文分别设计了鲁棒学习算法，提高了深度学习模型的容错能力。本文的主要贡献如下： 1. 本文提出了动态敏感损失来处理标签噪声。动态敏感损失是一种动态的鲁棒损失函数，考虑到深度学习模型倾向于先学习简单且普适的模式，然后逐渐过拟合标签噪声，动态敏感损失在训练初期令模型具有较强拟合能力，然后逐渐提高其鲁棒性。在训练后期，为了进一步减小标签噪声的负面影响，动态敏感损失令模型对简单样本的关注程度超过了对困难样本的关注程度并且引入了自举项。理论分析和实验结果都表明了动态敏感损失相较于现有的鲁棒损失函数的优越性。 2. 本文提出了边界样本认证来处理对抗样本。边界样本认证是一种基于用户输入流来检测决策攻击的方法，鉴于决策攻击的输入流在样本空间中集中在目标模型的决策边界附近，与正常访问存在明显差异，边界样本认证基于上述区别来检测攻击行为。实验结果表明，相较于现有的基于输入流的检测算法，边界样本认证大幅降低了内存负担和计算开销以及特定情况下的假阳率。
英文摘要	In recent years, deep learning has made a great breakthrough in the field of pattern recognition. However, its great success relies heavily on high-quality data, data noise may degenerate its performance substantially. For instance, during the training process, the large-scale training data inevitably contains some label noise. The deep learning model can memorize label noise perfectly due to its strong fitting ability, eventually leading to poor generalization. Another example is that during the inference process, adversarial attacks can craft adversarial examples by injecting impercepible adversarial perturbations into images, on which the deep learning model makes wrong decisions. Therefore, to handle the label noise in the training process and adversarial examples in the inference process, this thesis respectively designs robust learning algorithms to improve robustness of the deep learning model. The main contributions of this thesis are summarized as follows. Firstly, this thesis proposes dynamics-aware loss to handle label noise. Dynamics-aware loss is a dynamic robust loss. Considering that deep learning models tend to first learn generalized patterns, then gradually overfit label noise, dynamics-aware loss strengthens the fitting ability of the model initially, then gradually imrpoves its robustness. Moreover, at the later stage of the training process, to further reduce the negative effect of label noise, it makes the model put more emphasis on easy examples than hard ones, and introduces a bootstrapping term. Both the detailed theoretical analyses and extensive experimental results demonstrate the superiority of dynamics-aware loss over existing robust losses. Secondly, this thesis proposes boundary sample identification to handle adversarial examples. Boundary sample identification is a detection which identifies decision-based attacks according to the input stream. Given that decision-based attacks make queries near the decision boundary of the target model, which is quite different from the normal query behavior, boundary sample identification detects malicious attacks based on this distinction. Experimental results show that compared with existing detections based on the input stream, boundary sample identification reduces the memory burden and computational cost and decreases the false positive rate in specific cases.
关键词	模式识别标签噪声对抗样本
语种	中文
七大方向——子方向分类	模式识别基础
国重实验室规划方向分类	可解释人工智能
是否有论文关联数据集需要存交	否
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/52098
专题	毕业生_硕士学位论文
推荐引用方式 GB/T 7714	李修川. 面向数据容错的鲁棒模式识别[D],2023.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
Thesis.pdf（3580KB）	学位论文		限制开放	CC BY-NC-SA