CASIA OpenIR  > 毕业生  > 博士学位论文
开放环境下的高保真人脸图像生成研究
傅朝友
Subtype博士
Thesis Advisor赫然
2022-05-19
Degree Grantor中国科学院自动化研究所
Place of Conferral中国科学院自动化研究所
Degree Discipline模式识别与智能系统
Keyword人脸图像生成 开放环境 对偶变分生成 异质人脸识别
Abstract

人脸图像生成是机器学习和计算机视觉领域重要的研究方向之一,并且在互动娱乐和公共安全领域有着广阔的应用前景。近年来,随着深度学习的兴起,人脸图像生成迎来了蓬勃发展。在闭合环境下,当前的生成方法已经可以生成逼真的人脸图像。然而,在面对开放环境下小样本、大姿态和跨光谱等复杂因素时,当前的方法仍然存在很多问题。例如,开放环境下通常只能获取少量的异质人脸数据,这使得当前依赖于大规模训练数据的方法难以生成多样化的高保真异质人脸图像。本文以小样本下的无条件人脸生成、大姿态的人脸编辑和跨光谱的人脸转换这三个具体的任务为切入点,研究开放环境下的高保真人脸图像生成。主要的贡献如下:

1. 提出一种无条件对偶变分生成思路来解决异质人脸数据不足的问题:从噪声中采样生成大量多样化的配对异质数据,并且这些数据可作为数据增广来提升异质人脸识别模型的性能。该思路引申出两种渐进式的方法。第一种方法直接在小样本的异质数据集上训练一种精心设计的对偶变分生成器,学习配对数据的联合分布并约束其身份一致性。训练完成后,新生成的配对数据组成正样本对,用于训练异质人脸识别模型学习域不变的身份表达。但由于小样本的异质数据集的限制,新生成数据的身份多样性不足。为此,第二种方法引入一种正交约束解耦身份和属性。通过这种方式,大规模可见光人脸图像丰富的身份信息可以被融入生成的异质数据,大幅增加其身份多样性。此时,不同次采样生成的人脸图像可以被近似看做负样本对,用于训练异质人脸识别模型学习高判别的身份表达。大量的实验验证了以上两种方法的有效性。

2. 提出一种基于结构先验的解耦合方法,用于大姿态的高清人脸编辑。该方法采用两个相关联的阶段来简化这一复杂的任务。第一阶段引入一种条件生成器来预测目标人脸边界图,用以建模姿态和表情。对于数据集中不存在的姿态和表情,该方法利用一种条件回归损失间接地训练生成器。第二阶段在人脸边界图的指导下生成目标人脸图像。一个代理网络和一种特征阈值损失被用于辅助解耦结构和纹理,使得生成的结果更为精细化。大量的实验验证了该方法的有效性。此外,本文也公开了一个新的高质量多视角人脸数据集MVF-HQ(High-Quality Multi-View Face),共包含120,283张图像(分辨率可达6000x4000)。相较于其他公开的高清人脸数据集,MVF-HQ在数据规模、分辨率和人脸属性上都具有显著的优势。

3. 提出一种基于形状对齐的概率逐像素光谱转换方法。配对的异质人脸图像之间总是存在形状差异,这增加了光谱转换的难度。为此,该方法采用一种两阶段的处理策略:首先制作对齐的配对数据,再利用对齐后的配对数据训练光谱转换器。在3D人脸模型的辅助下,该方法将一种光谱下的人脸的形状与另一光谱下的人脸的形状进行对齐。但由于3D人脸模型难以刻画人脸属性的形状,比如头发和眼镜,对齐后的配对数据之间仍然存在像素差异。在这种情况下,该方法进一步引入一种概率逐像素损失,其在训练光谱转换器的同时能够自动发现像素差异,使得光谱转换器可以只专注于光谱转换这一个任务。大量的实验表明该方法可以采用轻量化的网络结构,但仍然取得很好的表现。

Other Abstract

Face image generation is one of the most important research directions in the field of machine learning and computer vision, and has a wide application prospect in the field of interactive entertainment and public security. In recent years, with the rise of deep learning, face image generation has ushered in a vigorous development. In closed environment, current generative methods can already generate photo-realistic face images. However, when facing complex factors in open environment, such as insufficient samples, large poses, and cross spectrum, current methods still have many problems. For example, there are only small-scale heterogeneous face data in open environment, making it difficult for current methods that rely on large-scale training data to generate diverse high-fidelity heterogeneous face images. This thesis studies high-fidelity face image generation under open environment with three specific tasks: unconditional face generation with small-scale samples, face manipulation with large poses, and cross-spectral face translation. The main contributions are as follows:

1. We propose an unconditional dual variational generation idea to tackle the problem of insufficient heterogeneous face data: large-scale diverse paired heterogeneous data are generated from noises, and can be used as data augmentation to improve the performance of the Heterogeneous Face Recognition (HFR) model. The idea results in two progressive methods. The first method directly trains a well-designed dual variational generator on the small-scale heterogeneous dataset, learning the joint distribution of paired data and constraining their identity consistency. After training, the newly generated paired data are organized as positive pairs, which are used to train the HFR model to learn domain-invariant identity representations. However, due to the limitation of the small-scale heterogeneous dataset, the identity diversity of the newly generated data is insufficient. To this end, the second method introduces an orthogonal constraint to disentangle identities and attributes. In this way, the abundant identity information of large-scale visible face images can be integrated into the generated heterogeneous data, greatly increasing their identity diversity. By now, face images generated from different samplings can be approximated as negative pairs, which are used to train the HFR model to learn highly discriminative identity representations. Extensive experiments verify the effectiveness of the two methods.

2. We propose a disentangling method based on structure prior for high-resolution face manipulation with large poses. This method uses two correlated stages to simplify the complex task. The first stage introduces a conditional generator to predict the target face boundary, which is used to model poses and expressions. For the poses and expressions that do not exist in the dataset, this method uses a conditional regression loss to train the generator indirectly. The second stage generates target face images under the guidance of the face boundary. A proxy network and a feature threshold loss are used to assist in disentangling structures and textures, leading to finer results. Extensive experiments verify the effectiveness of this method. Besides, we also release a new high-quality multi-view face dataset MVF-HQ, which has 120,283 images (resolution up to 6000x4000). Compared with other released high-resolution face datasets, MVF-HQ has significant advantages in data size, resolution, and face attribute.

3. We propose a probabilistic pixel-wise spectrum translation method with shape alignment. There are always shape discrepancies between paired heterogeneous face images, which increases the difficulty of spectrum translation. To this end, a two-stage processing strategy is adopted: first, the aligned paired data are produced, and then are used to train the spectrum translator. With the help of a 3D face model, this method aligns the face shape of one spectrum to that of the other spectrum. However, since the 3D face model is powerless to the shape of face attributes, such as hair and glasses, there are still pixel discrepancies between the aligned pairs. In this case, this method further introduces a probabilistic pixel-wise loss, which can automatically find the pixel discrepancies while training the spectrum translator, making it can focus on the single spectrum translation task. Extensive experiments show that this method can use a lightweight network architecture but still achieve good performances.

Pages128
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/48656
Collection毕业生_博士学位论文
智能感知与计算
Corresponding Author傅朝友
Recommended Citation
GB/T 7714
傅朝友. 开放环境下的高保真人脸图像生成研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2022.
Files in This Item:
File Name/Size DocType Version Access License
开放环境下的高保真人脸图像生成研究.pd(98636KB)学位论文 限制开放CC BY-NC-SA
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[傅朝友]'s Articles
Baidu academic
Similar articles in Baidu academic
[傅朝友]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[傅朝友]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.