多域伪造特征融合的人像证伪鉴定方法研究

CASIA OpenIR > 毕业生 > 硕士学位论文

	多域伪造特征融合的人像证伪鉴定方法研究
	王源
	2023-05-22
页数	122
学位类型	硕士
中文摘要	随着深度学习技术的飞速发展，计算机视觉领域在如今的人工智能时代扮演着日益重要的作用，成为了学术界和工业界共同关注的热点问题。人脸图像分析作为计算机视觉领域最重要的研究方向之一，其广泛应用于法庭科学取证、人脸动画生成、人脸数字认证等方面。利用先进的生成对抗网络(GAN)和变分自编码器(VAE)技术，人们可以任意生成效果逼真、肉眼难以分辨的高质量人脸图像，促进了以Deepfakes为代表的人脸深度伪造技术的蓬勃发展。一方面，这种深度伪造技术激发了很多相关的娱乐应用，如表情驱动、人像重演、风格编辑等。另一方面，人脸深度伪造也很容易被不法分子利用，通过合成虚假的语音制作色情电影、虚假新闻和政治谣言，给社会舆论和司法鉴定带来了极大的损害。为了减轻深度伪造技术带来的危害，学术界和工业界都展开了对人脸伪造视频检测技术的深入探索，就人脸视频图像的多模态特征域提出了许多高效的伪造检测方法，并对某些特定的篡改类型和伪造数据集取得了一系列的成功。作为重要应用领域之一，人脸伪造检测技术广泛应用于法庭科学人像证伪鉴定中，辅助法庭确定数字媒体证据的真伪，以确保判决的公正和合法性。然而，现有的伪造检测算法依赖特定分布数据、特定数据压缩率等诸多的局限性，远远落后于伪造视频的生成技术。因此，探索面向现实复杂开放场景的强鲁棒性、高泛化性的人脸深度伪造检测算法成为法庭科学取证中当前亟待解决的重要问题。此外，一个完整的通用人脸伪造检测器主要包括人脸检测、人脸关键点检测与对齐、伪造特征提取、特征融合与分类等模块。一般来说，人脸关键点检测精度的高低直接决定了人脸伪造检测模型性能的好坏和泛化性的强弱。针对上述提到的问题，本文以法庭科学对特定个体进行伪造人像检验为突破口，建立基于生理、物理和数字等具有可解释性鉴别特征的提取检验方法，研究基于多特征融合的视频人像证伪量化检验模型算法。本文分别从三维人脸关键点检测(可解释性的生理、物理特征)、细粒度多域伪造特征提取(可解释性的多域数字特征)、伪造特征融合与高阶相关性挖掘(多域伪造特征融合推理模型)等角度入手，从而实现可解释性人像证伪鉴定\textbf{特征}的挖掘和强鲁棒性、高泛化性的伪造人脸量化检验\textbf{模型}的构建。本文提出了融合图神经网络和热力图回归的三维人脸关键点检测算法、空频动态图学习的细粒度人脸伪造检测算法、渐进式内容-纹理相关性推理的人脸伪造检测算法，研究内容与主要贡献表述如下： (1) 如何从无序的点云数据中直接回归得到3D人脸关键点，从而提高人脸关键点检测精度？目前，大多数三维人脸关键点检测算法属于传统算法，这些方法通过手工设计的算子提取3D人脸的局部几何特征或者借助于现有的3D人脸模型进行模版匹配。然而，这些算法的准确率受到人工制作的特征或者固定化的3D人脸模型的限制。针对此问题，本文首先提出了一种基于热力图回归和图神经网络的三维人脸关键点检测模型框架，该方法通过融合图特征的几何深度学习算法，可以从3D点云数据中直接回归得到高精度的三维人脸关键点坐标。其次，本文创新性地提出了一种高效的3D人脸热力图回归算法，该算法利用局部曲面展开和局部曲面配准，进行3D人脸曲面数据的高精度拟合，极大地提高了三维人脸关键点的检测精度。（对应本文第三章） (2) 如何设计合理的多域人脸伪造特征语义表达，从而提高检测模型的泛化性和鲁棒性？首先，本文深入研究人脸图像的空间域和频率域特征。现有的伪造检测算法往往通过复杂设计的滤波器组提取频率域特征，然而这些滤波器特征丢失了空间信息，因此难以适应复杂现实场景(低光照、小尺度、强模糊)的变化。针对此问题，本文提出了一种内容感知的频率特征提取算法。通过深入探索自适应频率特征表示，并充分挖掘空间-频率域伪造证据的复杂交互关系，从而提高检测模型的多域特征泛化性。其次，本文受启发于图像变分分解理论，充分探索人脸图像的内容空间特征和纹理空间特征。我们提出了一种基于小波多分辨率分析的自动编码器，它利用自监督预训练的学习范式来更好地保留可感知的人脸结构化的内容特征和细粒度的纹理细节，对外界环境中的视觉压缩、噪声干扰等表现出良好的鲁棒性。（对应本文第四章、第五章） (3) 如何实现多域人脸伪造证据的融合理解推理，从而增强人脸伪造检测模型的可解释性？首先，针对空间域和频率域特征，本文提出了一个多域注意力图学习方案，探索多尺度细粒度的空间-频率域特征表征，更好地捕捉不同特征感受野的上下文信息，丰富空频特征的语义表示。我们设计了一个具有开创性的空间-频率特征动态图融合网络，借助图神经网络强大的多域特征融合与融合推理能力，充分探索多域伪造特征的高阶语义相关性，进而提高伪造检测模型的泛化能力和鲁棒性。其次，针对内容空间和纹理空间，本文提出了一个层次化学习方法，用于解耦内容和纹理空间表征，构建多级内容-纹理注意力特征图。我们设计了一个内容-纹理的域内-跨域特征融合模块，渐进地挖掘内容-纹理特征的高阶语义相关性，实现多域伪造特征的协同推理学习，有效地增强人脸伪造检测模型的可解释性。（对应本文第四章、第五章）
英文摘要	With the astonishing progress of deep learning technology, computer vision has become an indispensable aspect in the realm of Artificial Intelligence, which become a popular topic of discussion within both academic and industrial circles. Facial image analysis, one of the most important research directions in the field of computer vision, with wide-ranging applications in forensic scientific evidence, face animation, face digital authentication, etc. Utilizing the advanced Generative Adversarial Networks (GAN) and Variational AutoEncoder (VAE) methods, people can arbitrarily produce high-quality facial images that are ultra-realistic and difficult to be distinguished, thus promoting the booming development of deepfake techniques. On the one hand, This remarkable technology has sparked numerous face-related entertainment applications, including facial expression synthesis, portrait reenactment, style editing, and more. On the other hand, the face synthesis techniques are abused easily by certain malicious purposes, such as creating pornographic films, spreading fake news, and spreading political rumors through forged voice and counterfeit videos, which has caused significant damage to public opinion and judicial evaluation. To combat the detrimental impact of deepfake technology, both academia and industry have conducted extensive in-depth researches on face forgery detection. Several efficient forgery detection methods have been proposed for facial images or videos with multi-modal features, resulting in successes in identifying specific visual artifacts in certain datasets. As a crucial field of application, face forgery detection technology plays a significant role in forensic facial identification, aiding the court in verifying the authenticity of digital media evidence and upholding the integrity and legality of judgments. However, current forgery detection algorithms heavily rely on data distribution, video compression levels, and other limitations, rendering them less effective than the synthesis methods of forged videos. Under the circumstances, there is a pressing need in forensic scientific evidence to develop powerful face forgery detection models that can address the challenges of complex scenarios and possess strong robustness and high generalization. Further, an integrated forgery detector includes face detection, facial landmark detection and alignment, forged feature extraction, feature fusion and classification. Notably, the precision of facial landmarks significantly impacts the performance and generalization of subsequent forgery detection models. This paper addresses the aforementioned issues by taking forensic science's forgery detection of specific individuals as a pivotal point, and introducing an extraction method based on explainable identification features including physiological, physical, and digital characteristics. The study presents a video face forgery quantification detection algorithm based on multi-feature fusion, tackling the problem from three distinct perspectives: 3D facial landmark detection (explanatory physiological and physical features), fine-grained multi-domain forgery feature extraction (explainable digital features in multiple domains), forgery feature fusion and high-order relationships discovery (multi-domain forgery feature fusion and reasoning model). This comprehensive approach enables the exploration of explainable discriminative features for face forgery detection and facilitates the design of a highly robust and generalized face forgery detection model. The paper proposes three novel algorithms including a 3D facial landmark detection algorithm combining graph neural networks and heatmap regression, a fine-grained face forgery detection algorithm based on spatial-frequency dynamic graph learning, and a face forgery detection algorithm based on progressive content-texture relation reasoning. The research content and contributions are described in detail below: (1) How to get 3D facial landmarks from the unordered point cloud, thus improving the precision of landmarks? At present, most 3D facial landmark detectors mainly focus on traditional algorithms, which use manual-designed operators to extract local geometric features of 3D faces or resort to pre-defined 3D templates. However, the accuracy of these methods is severely limited by hand-crafted features and fixed 3D templates. In this paper, we presents a 3D facial landmark detection framework via heatmap regression and graph convolution networks. It can derive the high-precision coordinates of 3D facial landmarks directly from the 3D point cloud. Secondly, an effective and efficient 3D facial heatmap regression algorithm is proposed in this paper. It leverages the Local Surface Unfolding and the Local Surface Registration to achieve the high-precision fitting of the 3D face surface, which significantly improves the detection accuracy of 3D facial landmarks. (Referring to Chapter 3) (2) How to design reliable semantic representation of multi-domain face forgery features to improve the generalization and robustness of detection models? Firstly, the spatial and frequency features of forged faces are studied in depth. The existing forgery detection algorithms often extract frequency features through complex-designed filter banks. However, these filter features lose spatial information, so it is difficult to adapt to the changes of complex realistic scenes (low illumination, small scale, strong blur). To tackle this issue, a content-aware frequency feature extraction method is proposed. By fully exploring the adaptive frequency representation and effectively mining the complex interaction between spatial-frequency forged clues, the multi-domain generalization of the detection model is improved. Secondly, inspired by the variational theory of image decomposition, we are committed to explore the content-domain and texture-domain features of manipulated images. We propose a wavelet multi-resolution analysis based autoencoder, which uses the self-supervised pre-training protocol to better retain the perceptually structurized content and fine-grained texture details. It shows good robustness against visual compression and noise interference in the natural environment. (Referring to Chapter 4 and and Chapter 5) (3) How to realize the fusion and understanding of the multi-domain face forged clues, thus enhancing the interpretability of forgery detection models? Firstly, with respect to the spatial and frequency domain, a multi-domain attention maps learning scheme is proposed to exploit multi-scale fine-grained spatial and frequency representations. It enable to capture the context information of different receptive fields and enrich the expression ability of spatial-frequency features. We design a ground-breaking framework via spatial-frequency feature dynamic graph learning, which is capable of fully discovering the high-order relationships of multi-domain forged clues with the powerful reasoning ability of graph neural networks. It can further improve the generalization ability and robustness of the face forgery detection model. Second, in terms of content and texture domain, we present a hierarchical learning method for decoupling content and texture domain representation, and building multi-level content-texture attention maps. In this paper, we design an intra-domain and cross-domain fusion module for content-texture features, which progressively excavate the high-order semantic relationships of the content-texture domains, and realize the collaborative reasoning of multi-domain forged features, and effectively strengthen the interpretability of the forgery detection models. (Referring to Chapter 4 and Chapter 5)
关键词	三维人脸关键点检测热力图回归人脸伪造检测图卷积神经网络
学科领域	人工智能
学科门类	工学::控制科学与工程
收录类别	其他
语种	中文
是否为代表性论文	是
七大方向——子方向分类	图像视频处理与分析
国重实验室规划方向分类	视觉信息处理
是否有论文关联数据集需要存交	否
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/51721
专题	毕业生_硕士学位论文
推荐引用方式 GB/T 7714	王源. 多域伪造特征融合的人像证伪鉴定方法研究[D],2023.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
硕士学位论文-王源-多域伪造特征融合的人（11307KB）	学位论文		限制开放	CC BY-NC-SA