CASIA OpenIR  > 智能感知与计算
基于生成模型的人脸妆容分析
李祎
Subtype博士
Thesis Advisor谭铁牛
2020-05-31
Degree Grantor中国科学院大学
Place of Conferral中国科学院自动化研究所
Degree Discipline模式识别与智能系统
Keyword人脸图像合成 妆容分析 对抗学习 语义指导 解耦表示
Abstract

妆容是人们在日常生活中提升个人魅力或改变面部外观的常用方式之一。例如,粉底可以用来遮盖面部瑕疵,而腮红则可以修饰脸型。随着化妆行为的普遍流行和相关产品的不断发展,近年来妆容分析研究在计算机视觉领域受到了广泛关注。作为一项新兴的研究内容,妆容分析具有多种现实应用需求,例如化妆品柜台的虚拟试妆体验,以及影视作品中的特殊效果等。虽然相关研究已经取得了一些进展,但妆容分析问题仍然面临着诸多挑战。首先,化妆这一行为本身具有较强的不确定性和多变性,这使得精准匹配化妆前后的人脸图像变得十分困难。另外,非严格受控环境下的面部姿态、表情等复杂变化,及其导致的数据不对齐问题也是妆容研究中不可回避的难点问题。以深度生成模型的相关进展为基础,本文针对上述挑战,以图像合成方式对人脸妆容分析中的跨妆容人脸验证、妆容去除和妆容变换等任务展开研究。本文取得的研究成果主要包含以下三项。 

  1. 针对妆容造成人脸验证性能急剧下降的问题,本文提出了以生成模型为基础的“先合成后验证”解决思路,并设计了双层对抗网络来实现跨妆容人脸验证。为了尽量消除妆容带来的副作用,该解决思路首先把化妆人脸图像转换为素颜人脸图像,然后再对合成的输出图像提取身份特征,并与真实素颜图像进行匹配。具体地,双层对抗网络包含图像和特征两个层面上的对抗学习机制。其中,图像层面的对抗学习用以获得较好的素颜图像合成效果,而特征层面的对抗学习则用以保证图像转换后的身份信息不变。为了进一步提升双层对抗网络的图像合成质量,本文提出对网络中的生成器结构进行改进,并设计了双路感知网络。顾名思义,双路感知网络包含两个支路,其中一个支路负责合成全局结构,而另外一个支路则负责特定区域的局部细节。本文提出的“先合成后验证”解决思路的优势在于,它能够在不改变现有人脸验证方法参数的前提下,提升其在跨妆容验证任务中的表现。实验结果表明,本文方法通过将化妆人脸图像转换为素颜人脸图像的方式在三个数据库上取得了跨妆容人脸验证的性能提升。 

  2. 针对非严格受控环境下的人脸妆容去除问题,本文充分考虑非配对数据中普遍存在的人脸姿态、表情等面部形状变化问题,提出一种基于语义指导的人脸妆容去除方法。本文认为在逼真视觉效果的前提下,理想的输出结果与输入图像之间的变化应该只存在于面部妆容的相关区域,而尽量保持其他图像区域不变。为此,该方法从真实的卸妆过程中汲取灵感,面部妆容是多种化妆品作用于面部皮肤之后的综合结果,且不同的化妆区域会呈现明显不同的效果,因此需要针对不同的妆容问题设计不同的卸妆方案。具体而言,该方法包含两种基于语义指导的学习策略:在图像层面,该方法提出了一种注意力模块来无监督地定位面部妆容位置并表示其化妆浓淡;在特征层面,该方法设计了一个基于语义图指导的纹理损失函数,在网络训练阶段为合成逼真的素颜纹理细节提供更加可靠的监督信息。另外,为了进一步促进人脸妆容研究的发展,本文收集了非严格受控环境下的人脸妆容相关图像,并建立了跨妆容人脸数据库 CMF(Cross-Makeup Face)。实验结果表明,该方法不仅可以在姿态和表情等面部形状多变的情况下取得视觉质量较高的妆容去除效果,并且以此取得了比上一方法更好的跨妆容人脸验证性能。 

  3. 针对非严格受控环境下的妆容变换问题,本文提出了一种妆容图像解耦方法,以在统一的深度生成网络中实现妆容去除、自动化妆和妆容迁移等多种功能。总体而言,一幅化妆人脸图像包含三个主要部分:妆容、身份信息和形状结构(例如表情、姿态变化)。对于妆容信息和身份信息,该方法利用解耦表示学习策略将图像特征解耦成妆容表示和身份表示。其中,妆容表示用以控制输出图像的妆容效果,而身份表示用以监督输出图像的身份信息不变。对于图像中的其他变化因素,该方法将其归结为输入图像的固有信息,并认为在合成过程中应该保持这些信息不变。因此,该方法引入稠密关联场来应对姿态、表情等形状结构变化问题,从而在输出图像中保持输入人脸的几何先验信息。另外,该方法继续采用了上一方法中的基于语义图指导的纹理损失函数,以取得逼真的妆容细节变换效果。实验结果表明,本文提出的妆容图像解耦网络能够依据不同的妆容表示输出不同的妆容变换结果,且图像质量较高,视觉效果良好。遵循“先合成后验证”思路,本文还使用妆容图像解耦网络得到的妆容去除图像进行了跨妆容人脸验证测试,结果表明本文方法取得了目前领先的验证结果。这进一步验证了本文提出的“先合成后验证”解决思路的可行性和有效性。

Other Abstract

Makeup is widely used to enhance attractiveness or alter appearance. For instance, powder foundations can hide the skin imperfections, and blushes are for creating chubby cheeks. With the popularity of wearing makeup and the development of cosmetic products, makeup analysis has recently caught much attention in computer vision. As one of the rising topics, makeup analysis receives demands in various scenarios, e.g., virtual makeup at cosmetic counters and special effects in movies. Although having experienced progress, makeup analysis is confronted with challenges. For instance, matching the images before and after makeup feels hindered by the indeterminacy and variability of makeup. And the unpaired data caused by the pose and expression variations is another imperative problem. Inspired by the substantial progress of deep generative models, we propose to cope with these challenges and achieve makeup analysis tasks including makeup-invariant face verification, makeup removal and makeup transfer in a generative manner. The main contributions in the thesis are summarized as follows.

  1. Facing the problem that cosmetics decreases the performance of face verification approaches significantly, we propose a “verification after generation” solution and design a bi-level adversarial network (BLAN) to achieve makeup-invariant face verification. To alleviate the negative effects of makeup, we first generate non-makeup images from makeup ones, and then use the synthesized non-makeup images for further verification. Specifically, there are two adversarial sub-networks on different levels in BLAN, with the one on the pixel level for reconstructing appealing facial images and the other on the feature level for preserving identity information. For the non-makeup image generation module, a two-path network that involves both global and local structures is applied to improve the synthesis quality. One of the advantages of our generative approach is being able to extend the existing facial feature extraction models to make-up problems without retraining or fine-tuning the underlying models. Experimental results on three datasets demonstrate that the verification performance is improved by generating non-makeup faces from makeup ones. 

  2. Aimed at face makeup removal, we propose a semantic-aware makeup cleanser (SAMC) to remove facial makeup under different poses and expressions. Except makeup, we argue that other factors in the output are expected to retain the same with its input, based on the premise of realistic visual quality. The intuition lies in the fact that makeup is a combined effect of multiple cosmetics and tailored treatments should be imposed on different cosmetic regions. Hence, we present two semantic-aware learning strategies in SAMC. On the image level, an unsupervised attention module is jointly learned with the generator to locate cosmetic regions and estimate the degree. On the feature level, we resort to the effort of face parsing merely in training phase and design a localized texture loss to serve complements and pursue superior synthetic quality. In addition, a new Cross-Makeup Face (CMF) benchmark dataset with in-the-wild makeup portraits is built up to push the frontiers of related research. The experimental results verify that SAMC not only produces appealing de-makeup outputs, but also facilitates makeupinvariant face verification with better generation quality. 

  3. To fulfill multiple makeup editing tasks in a unified network, we propose a disentangled feature learning approach for makeup portraits in-the-wild, namely MUP-D. Overall, a makeup portrait can be decomposed into three components: makeup, identity and geometry (including expression, pose etc.). We assume that the extracted image representation can be decomposed into a makeup code that captures the makeup style and an identity code to preserve the source identity. The makeup code can be applied to makeup transfer while the identity representation provides prominent supervision over the output. As for other variation factors, we consider them as native structures from the source image that should be reserved. Thus a dense correspondence field is integrated in the network to preserve the geometry on a face. To encourage delightful visual results after makeup transfer, we employ the semantic-aware texture loss to learn makeup styles in a delicate way. Both visual and quantitative experimental results demonstrate the superiority of the proposed method and the “verification after generation” thought.

Pages166
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/39151
Collection智能感知与计算
Recommended Citation
GB/T 7714
李祎. 基于生成模型的人脸妆容分析[D]. 中国科学院自动化研究所. 中国科学院大学,2020.
Files in This Item:
File Name/Size DocType Version Access License
李祎-基于生成模型的人脸妆容分析-sig(14427KB)学位论文 开放获取CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[李祎]'s Articles
Baidu academic
Similar articles in Baidu academic
[李祎]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[李祎]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.