面向人脸鉴伪应用场景的模型优化技术研究

CASIA OpenIR > 毕业生 > 硕士学位论文

面向人脸鉴伪应用场景的模型优化技术研究

卓文琦

2023-05-22

页数

学位类型

硕士

中文摘要

智能通讯设备的升级换代和深度学习技术的飞速发展为人脸内容合成的广泛流行提供了有利条件。相关技术操作的门槛被大大降低，网络用户可以轻松获取人脸图像、视频数据，并利用各种修图软件或开源代码进行AI换脸和面部编辑等，在各大社交媒体和公众平台上掀起一波又一波的创作热潮。该项技术在被应用于艺术创作、教育教学和人机交互等正向场合的同时，恶意滥用所带来的负面影响远大于积极影响。一方面，现有合成技术发展迅猛，制作的伪造人脸足够以假乱真，极大冲击了人们“眼见为实”的传统观念；另一方面，恶意伪造内容往往迎合了大众的猎奇心理，具有极强的意识塑造和扭曲能力。目前，国内外已发生多起与人脸伪造有关的违法犯罪案件，不仅对个人的名誉和财产安全造成了严重损害，对社会的稳定秩序、国家的网络主权和政治安全也带来了巨大威胁。因此，开展人脸鉴伪技术的相关研究工作势在必行。

国内外研究团队提出了很多行之有效的人脸鉴伪方法，在开源伪造人脸数据集上表现良好，鉴伪准确率不断刷新。然而当它们被推广到具体应用时，却问题频发，如模型部署过程中会受到计算资源限制、应对新型伪造不及时等。鉴于上述背景，本文聚焦实际应用场景中的人脸鉴伪技术，选择人脸鉴伪模型的轻量化和终身学习问题作为研究重点，主要工作如下：

1) 提出了一种面向轻量级人脸鉴伪的模型压缩方法，分两阶段完成人脸鉴伪模型的轻量化处理。第一阶段，利用后训练量化对高性能人脸鉴伪模型的关键参数进行压缩，将权重和激活值从高位宽浮点数转换成低位宽整数，有效减少了模型的内存占用和计算开销。第二阶段，获取校准集，将其输入到轻量人脸鉴伪模型中，通过多次前向推理动态校准激活值范围。本文方法未使用原训练集的采样数据作为校准集，而是利用知识蒸馏充分挖掘了蕴藏在预训练模型各批归一化层中的数据分布信息，指导生成器合成一批与原训练集分布相似的数据以进行替代，实现了校准过程的无数据化。由于激活值的范围取决于激活函数的类型，为进一步保障轻量模型的鉴伪性能，通过理论分析找到了对参数量化操作最友好的激活函数ReLU6，并做出了相应替换。在2个经典伪造人脸数据集上对所提方法展开了充分的实验测评和论证。结果表明，该方法能够成功压缩一系列先进的人脸鉴伪模型，得到的轻量模型基本保持原有高性能，而运行所需的内存占用和计算资源显著降低。

2) 提出了一种基于无范例特征重放的人脸鉴伪终身学习方法，用于提升人脸鉴伪模型应对新型伪造的能力。该方法充分利用人脸鉴伪任务特质，一方面，根据不同类型伪造人脸之间的差异性，将人脸鉴伪模型的终身学习建模为“域增量学习”问题，分阶段在各伪造人脸数据集上训练；另一方面，根据不同类型伪造人脸之间的相似性，将人脸鉴伪模型拆解成两部分——固定的特征提取器和持续调整的分类器。在每阶段的训练中，都投入了一定数目的旧域重构特征与当前域特征进行联合训练以减轻无范例条件下旧知识的灾难性遗忘，进而改善模型在不同域上的性能失衡问题。本方法在6个代表性伪造人脸数据集上进行了大量实验和分析。结果表明，所提方法能够完成对持续新域的自适应，同时在旧域上仍然表现良好。相比于联合训练或分支集成这两类方法，模型性能基本得到了保持，而训练资源耗费被极大减少。

英文摘要

The upgrading of intelligent communication devices and the rapid development of deep learning have provided favorable conditions for the widespread popularity of facial content synthesis. The threshold for technical operations has been greatly reduced, and network users can easily obtain facial images and videos, and use various editing software or open source code to perform AI faceswapping and facial editing, causing a lot of passion about this in social media and public platforms. While this technology can be used for positive applications, such as artistic creation, education, human-computer interaction and etc. The negative impact of malicious abuse is much greater than the positive impact. On the one hand, the face forgery technology develops rapidly and it can produce ultra-realistic visual content about human faces, which break up a traditional concept of "seeing is believing". On the other hand, malicious contents about human face often meets the curiosity of the public and has strong abilities to shape and distort their consciousness. Currently, there have been many illegal and criminal cases related to facial forgery, which have not only caused severe damage to individual reputation and financial security, but have also posed great threats to the social stability, the internet sovereignty and political security of our country. Therefore, conducting research on face forgery detection is imperative.

Until now, researchers have proposed many effective methods for face forgery detection, which perform well on public datasets and the detection accuracy increases constantly. However, there still exist many problems when applying these methods to real-world scenarios. For example, high-accuracy face forgery models can not be deployed directly due to the limitation of computing resources, model trained on specific dataset is unable to deal with newly emerged forgery types, and etc. In this thesis, we pay attention to face forgery detection in real-world scenarios, and choose face forgery detection model quantization and lifelong learning of as the key parts of our research. The main contributions are listed as follows:

1) A novel model compression method is proposed, which utilizes a two-stage scheme to obtain light-weight face forgery detection models. At the first stage, the key parameters of a high-accuracy face forgery detection model are compressed through using post-training quantization, that is, the weights and activation values are all converted from high-width floating points to low-width integers, which effectively reduces the memory footprint and computing consumption. At the second stage, the calibration set is obtained and fed into the light-weight face forgery detection model to calibrate the activation range dynamically during the forward passing process. We does not use data sampled from original training set as the calibration data. Instead, we fully mine statistic information contained in each batch normalization layer of the pre-training model through knowledge distillation, and then use them to guide the generator for synthesizing a batch of data which has a similar distribution to original training data. By this way, the calibration process can be implemented in data-free environment. In addition, it is known that the activation value depends on the type of activation function. Therefore, we find the most quantization-friendly activation function, ReLU6 , through theoretical analysis for further improving the performance of light-weight model. The proposed method is tested on two classical fake face datasets. The results show that the proposed method can successfully compress a series of advanced face forgery detection models to light-weight models, which still maintain a high performance. At the same time, the memory consumption and computing resources are significantly reduced.

2) A lifelong learning method is proposed, which utilizes feature reconstruction and feature joint-learning to make face forgery models have the ability to cope with new forgery types in non-examplar scenario. This method makes full use of the characteristics of face forgery detection. On the one hand, according to the differences among different types of forged faces, the lifelong learning of face forgery detection model can be modeled as "domain incremental learning", which trains model on face forgery datasets sequentially. On the other hand, according to the similarity between different types of forged faces, the face forgery detection model is split into two parts: a fixed feature extractor and a continuously adjusted classifier. In each training stage, a certain amount of reconstructed features originted from old domain and features newly extracted from current domain are fed to the classifier for joint training, which can fight against the catastrophic forgetting of old knowledge and balance the performance in different domains. Extensive experiments were carried out on 6 representative fake face datasets. The results show that the proposed method can make a face forgery detection model be adaptive to the continuous new domain while still maintaining performance well in the old domain. Compared with joint training or branch integration, the performance of the model is basically maintained and the cost of training resources is greatly reduced.

关键词

人脸伪造人脸鉴伪轻量模型无数据模型压缩终身学习

语种

中文

七大方向——子方向分类

图像视频处理与分析

国重实验室规划方向分类

视觉信息处理

是否有论文关联数据集需要存交

否

文献类型

学位论文

条目标识符

http://ir.ia.ac.cn/handle/173211/52070

专题

毕业生_硕士学位论文

通讯作者

卓文琦

推荐引用方式
GB/T 7714

卓文琦. 面向人脸鉴伪应用场景的模型优化技术研究[D],2023.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
卓文琦-硕士学位论文-最终签字版.pdf（8920KB）	学位论文		限制开放	CC BY-NC-SA