CASIA OpenIR  > 毕业生
受限场景下知识引导的人脸图像编辑研究
吕月明
2024
Pages146
Subtype博士
Abstract

人脸图像作为最直观的生物特征,在人类文化和社会交流中占据重要地位。自古以来,人们便以多种形式记录和描绘人脸。随着近现代照相设备的不断升级和互联网技术的飞速发展,人脸数字照片得到普及,人们可以方便地使用各种编辑软件对人脸图像进行编辑和美化。近年来,人工智能技术的蓬勃发展更是为人脸图像编辑带来了更高的灵活性、可控性和交互性,为各行业带来了丰富的创新性和广阔的可能性。

鉴于人脸图像编辑技术的广泛应用和重要价值,它已成为学术界的研究热点,并得到了迅速的发展。然而,当前的人脸图像编辑技术仍面临多种受限场景,主要体现为样本受限(存在阴影、遮挡或大姿态的困难样本稀缺)、内容受限(丰富纹理和复杂内容等内容不足)以及标注受限(人脸属性标注的数量有限)三大难点问题。这些限制不仅影响训练数据集的多样性和完整性,更导致模型在实际应用中的鲁棒性和泛化能力受限。为克服上述挑战,本文的总体研究思路是在数据驱动的基础上,引入不同形式的知识进行引导。

本文的主要工作和创新点归纳如下:

1. 针对训练数据中困难样本受限问题,提出了一种三维先验知识引导的妆容编辑方法。在人脸组件编辑领域,妆容编辑是一个关键任务,不仅与皮肤、眼部、唇部等多种人脸生物特征密切相关,而且在人脸认证和隐私保护等应用场景具有重要价值。因此,探索一个在样本受限情况下仍能保持鲁棒的妆容编辑模型尤为关键。具体而言,该方法通过三维人脸模型获取人脸图像的形状和纹理,并在纹理空间提出UV纹理生成器来执行编辑。为了提高编辑的准确性和鲁棒性,利用UV空间中的人脸对称先验知识,引入了妆容调整模块和妆容编辑模块。得益于UV空间中姿态和表情变化的显式归一化,该方法在大姿态情况下仍能实现鲁棒的编辑。基于上述提出的妆容编辑模型,进一步拓宽其在人脸隐私保护方面的应用。具体而言,通过引入三维视角和三维人脸可见性图等额外的人脸三维先验知识,设计了一种对抗性妆容生成模型。该模型可以在生成的妆容图像中添加扰动,有效隐藏原始身份,使得未经授权的人脸识别系统难以识别。广泛的定性和定量实验表明,该方法不仅在处理阴影、遮挡和大姿态等困难样本受限场景下实现了鲁棒的妆容编辑,而且通过引入妆容攻击机制,能够以高成功率保护个人身份不被未授权的人脸识别系统识别。

2. 针对训练数据中复杂内容受限问题,提出了一种层次化知识引导的细粒度区域自适应归一化方法,旨在实现对人脸各个组件的精确编辑。该方法自适应地从粗粒度到细粒度地编码风格特征,精确地生成细粒度的风格和纹理,同时保持对整体色调等粗粒度特征的捕捉。具体而言,首先提出了空间感知金字塔池化,构建了一个风格金字塔,该金字塔在不同层级上计算风格参数,以表示多层次的风格信息。随后,提出了动态门控机制,该机制能够动态地整合不同层级的风格信息,学习对于给定的每个组件区域而言,更为重要的风格特征。通过该机制,模型能够自适应地融合特征,确保生成的风格既符合整体风格,又能在细节上呈现出精细变化。为全面评估该方法的有效性,构建了名为Makeup-Complex的测试数据集。该数据集包含多种姿态下的复杂妆容样本,用于测试该方法在处理复杂内容受限问题时的细粒度编辑能力。实验结果表明,该方法能够精确地生成具有细粒度特征的图像结果。

3. 针对训练数据中属性标注受限问题,受文本-图像预训练模型(如CLIP模型)启发,提出了一种跨模态语义知识引导的通用属性编辑方法。CLIP模型通过在海量的图像-文本对上进行训练,建立了图像与文本之间的丰富语义关联。因此,该方法引入CLIP模型的图像-文本知识以表示人脸图像的多样属性,可以减少对属性标注的依赖。具体而言,首先提出了CLIP差值空间的概念,并对该空间进行了深入分析,证明了其良好的对齐特性。基于该特性,进一步提出了一种差值编辑方法。该方法能够在两种不同类型的生成模型上实现有效、通用的编辑。为了拓展该方法的应用范围,还提出了一种风格条件扩散模型。该模型结合了StyleGAN的语义空间,对条件扩散模型的正向和反向过程进行条件控制。为验证方法的有效性,进行了多项实验,包括潜在空间插值、真实图像重建、风格混合及文本驱动编辑等方面的对比。实验结果显示,该方法在上述任务中均表现优异。

Other Abstract

Facial images, as the most intuitive biometric features, play a significant role in human culture and social interaction. Since ancient times, people have recorded and depicted faces in various forms. With the continuous advancement of modern photographic equipment and the rapid development of internet technology, digital facial photos have become popular, allowing people to conveniently edit and beautify facial images using various editing software. In recent years, the flourishing development of artificial intelligence technology has further enhanced the flexibility, controllability, and interactivity of facial image editing, bringing rich innovation and vast possibilities to various industries.

Given the widespread application and significant value of facial image editing technology, it has become a hot topic in academia and has undergone rapid development. However, current facial image editing technology still faces various limited scenarios, primarily manifested as limited samples (scarcity of difficult samples with shadows, occlusions, large poses and expressions), limited content (insufficient rich textures and complex content), and restricted annotations (limited number of facial attribute annotations). These constraints not only affect the diversity and completeness of training datasets but also restrict the robustness and generalization ability of models in practical applications. To overcome these challenges, the overall research strategy of this thesis is to introduce different forms of knowledge guidance based on data-driven approaches.

The main innovations and contributions are summarized as follows:

1. Addressing the problem of limited difficult samples in the training data, a 3D-prior-knowledge-guided makeup editing method is proposed. Makeup editing is a key task in the field of facial component editing, closely related to various facial biometric features such as skin, eyes, and lips, and holds significant value in applications such as face authentication and privacy protection. Therefore, exploring a robust makeup editing model that can still maintain robustness under sample limitations is particularly crucial. Specifically, this method acquires the shape and texture of face images through a 3D face model and designs a UV texture generator to perform editing in the texture space. To improve the accuracy and robustness of editing, makeup adjustment and makeup transfer modules are introduced by leveraging facial symmetry prior knowledge in the UV space. Benefiting from the explicit normalization of pose and expression variations in the UV space, this method achieves robust editing even under large poses and expressions. Based on the proposed makeup editing model, its application in facial privacy protection is further expanded. By introducing additional 3D facial prior knowledge such as 3D perspectives and 3D visibility maps, an adversarial makeup generation model is designed. This model adds perturbations to generated makeup images to effectively conceal the original identity, making it difficult for unauthorized face recognition systems to identify. Extensive qualitative and quantitative experiments demonstrate that the method not only achieves robust makeup editing in scenarios with limited samples, such as shadows, occlusions, large poses, and expressions, but also, by introducing makeup attack mechanisms, effectively preserves individual identity from unauthorized facial recognition systems with a high success rate.

2. Addressing the problem of limited content in the training data, a hierarchical knowledge-guided detailed region-adaptive normalization method is proposed to achieve precise editing of various components of the face. This method adaptively encodes style features from coarse to fine, accurately generating fine-grained styles and textures while capturing coarse-grained features such as overall tones. Specifically, spatial-aware pyramid pooling is proposed to construct a style pyramid, which calculates style parameters at different levels to represent multi-level style information. Subsequently, a dynamic gating mechanism is proposed to dynamically integrate style information at different levels, learning more important style features for each component region of the current input. Through this mechanism, the model can adaptively fuse features, ensuring that the generated style not only conforms to the overall style but also presents fine changes in detail. To comprehensively evaluate the effectiveness of this method, a test dataset named Makeup-Complex is constructed. This dataset contains complex makeup samples under various poses and expressions, used to test the fine-grained editing capabilities of the method. Experimental results show that this method can generate images with fine-grained features.

3. Addressing the problem of limited attribute annotations in the training data, inspired by text-image pre-training models (such as the CLIP model), a cross-modal semantic knowledge-guided universal attribute editing method is proposed. The CLIP model, trained on massive image-text pairs, establishes rich semantic correlations between images and texts. Therefore, this method introduces the image-text knowledge of the CLIP model to represent diverse attributes of facial images, reducing the dependency on attribute annotations. Specifically, the concept of CLIP DeltaSpace is introduced and thoroughly analyzed, demonstrating its excellent alignment property. Based on this property, a DeltaEdit method is further proposed, capable of effective and general editing on two different types of generative models. To extend the applicability of this method, a style-conditioned diffusion model is also introduced, integrating the semantic space of StyleGAN and conditioning the forward and backward processes of the diffusion model. To validate the effectiveness of the method, various experiments are conducted, including latent space interpolation, real image reconstruction, style mixing, and text-driven editing. Experimental results show that the method performs excellently in the above tasks.

Keyword受限场景 人脸图像编辑 生成对抗网络 扩散模型
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/56566
Collection毕业生
毕业生_博士学位论文
Recommended Citation
GB/T 7714
吕月明. 受限场景下知识引导的人脸图像编辑研究[D],2024.
Files in This Item:
File Name/Size DocType Version Access License
受限场景下知识引导的人脸图像编辑研究.p(32704KB)学位论文 限制开放CC BY-NC-SA
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[吕月明]'s Articles
Baidu academic
Similar articles in Baidu academic
[吕月明]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[吕月明]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.