深度学习中的视觉对抗攻击方法研究

CASIA OpenIR > 毕业生 > 博士学位论文

	深度学习中的视觉对抗攻击方法研究
	何子文
	2023-05-22
页数	102
学位类型	博士
中文摘要	以深度学习为代表的人工智能技术取得了显著进展，已广泛应用于自动驾驶、人脸识别等领域。然而，人工智能系统的安全问题日益凸显，尤其是以深度神经网络模型为基础的计算机视觉系统。该类模型普遍具有脆弱性，面对攻击者恶意制造的对抗样本容易产生错误的预测结果，从而造成潜在安全威胁。因此，从追求高性能到确保系统安全可靠是人工智能研究的主要趋势。视觉对抗攻击是计算机视觉模型安全研究中的重要方向，旨在探究各种潜在的攻击技术手段，有助于构建更全面的模型安全评估体系。近年来，研究者提出了多种对抗攻击方法，但其攻击能力尚显不足。第一，现有方法迁移攻击成功率及效率相对较低，阻碍了模型在黑盒场景中的鲁棒性测评。第二，现有方法生成的对抗样本隐蔽性和抗检测性不足，导致对抗攻击容易被发现，从而使模型的真实安全性度量出现偏差。第三，目前面向不可微分模块的对抗攻击方法较为欠缺，无法为广泛的视觉模型提供有效的测评手段。为解决上述问题，本文针对性地提出对抗攻击新方法，具体研究工作归纳如下： 1）针对对抗攻击迁移能力不足的问题，本文提出了一种梯度启发的模型集成对抗攻击方法。该方法从梯度融合角度改进了模型集成机制，通过引入梯度归一化模块和分组机制两种方式，避免极端幅值梯度对模型集成的负面影响，有效地提升了对抗样本的迁移能力。 2）针对稀疏对抗攻击迁移能力不足的问题，本文提出了一种数据分布引导的稀疏对抗攻击方法。该方法通过引入先验数据分布，来缓解对抗攻击对单样本的过拟合现象，有效地利用了源模型与目标模型之间的统计相似性，大幅提升了稀疏对抗样本的迁移攻击成功率和生成效率。 3）针对对抗攻击隐蔽性和抗检测性不足的问题，本文以人脸深度伪造任务为具体场景，提出了一种基于隐空间搜索的对抗攻击方法。该方法通过将像素空间的显式对抗扰动转变为生成模型隐空间的对抗扰动，使得对抗图像修改更难以察觉，有效地提升了对抗攻击的隐蔽性和抗检测性。 4）针对面向不可微模块的对抗攻击问题，本文以步态识别模型的二值化模块为研究对象，提出了一种不可微模块旁路攻击方法。该方法通过引入反向生成模块，使得对抗信息能够成功绕过不可微模块继续传递，为面向不可微模块的对抗攻击提供了一种有效的解决方案。通过上述研究，本文有效地提升了对抗样本的迁移能力、隐蔽攻击能力以及面向不可微模块的攻击能力，为深度神经网络模型的全面准确安全性度量奠定了理论和方法基础。
英文摘要	Significant progress has been made in artificial intelligence (AI) technologies, represented by deep learning, which has been widely applied in fields such as autonomous driving and facial recognition. However, the security issues of AI systems, especially computer vision systems based on deep neural network models, are increasingly prominent. These models are generally vulnerable to adversarial examples maliciously crafted by attackers to result in erroneous predictions, thereby posing potential security threats to AI systems. Therefore, the trend in AI research is shifting from pursuing high performance to ensuring security and reliability. Visual adversarial attacks are an important direction in computer vision model security research, aimed at exploring various potential attacks and helping build a more comprehensive model security evaluation system. In recent years, researchers have proposed various adversarial attack methods, but their attack capabilities are still insufficient. Firstly, the transfer attack success rate and efficiency of existing methods are relatively low, which affects the robustness evaluation of models in black box scenarios. Secondly, the adversarial examples generated by existing methods are easy to be detected by human eyes or detection models, hindering the accurate security measurement. Thirdly, there is a lack of effective adversarial attack methods for non-differentiable modules, which cannot provide effective evaluation methods for a wide range of visual models. To address these issues, novel adversarial attack methods are proposed, which are summarized as follows: 1）To boost the transferability of adversarial attacks, a gradient-based model ensemble adversarial attack method is proposed. This method improves the model ensemble mechanism from the perspective of gradient fusion, and introduces two strategies, including gradient normalization module and grouping mechanism, to avoid the negative effects of extreme gradient amplitudes on the ensemble model, thereby effectively enhancing the transferability of adversarial examples. 2) To boost the transferability of sparse adversarial attacks, a data distribution-guided sparse adversarial attack method is proposed. This method alleviates the overfitting of adversarial attacks to a single sample by introducing prior data distribution learning, and effectively utilizes the statistical similarity between source model and target model, therefore significantly improving the success rate and efficiency of sparse adversarial attacks. 3) To enhance the imperceptibility and anti-detection capabilities of adversarial attacks, a deepfake task-specific adversarial attack method based on latent space search is proposed. This method transforms the explicit adversarial perturbation in pixel space into the adversarial perturbation in the latent space of generative models, making the modification of the adversarial image more difficult to detect, and effectively enhancing the imperceptibility and anti-detection capabilities of adversarial attacks. 4) To address the issue of adversarial attacks in non-differentiable scenarios, an adversarial attack method to bypass the binarization module of gait recognition models is proposed. This method introduces a reverse generation module, which enables adversarial information to successfully bypass the non-differentiable module and continue to propagate, providing an effective solution for adversarial attacks on non-differentiable modules. Through the above research, the adversarial attack transferability, imperceptibility, and capability against non-differentiable modules are effectively enhanced, laying a theoretical and methodological foundation for the comprehensive and accurate security measurement of deep neural networks.
关键词	对抗攻击对抗样本深度神经网络人工智能安全
语种	中文
七大方向——子方向分类	多模态智能
国重实验室规划方向分类	多模态协同认知
是否有论文关联数据集需要存交	否
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/51673
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	何子文. 深度学习中的视觉对抗攻击方法研究[D],2023.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
深度学习中的视觉对抗攻击方法研究.pdf（13003KB）	学位论文		限制开放	CC BY-NC-SA