CASIA OpenIR  > 模式识别实验室
基于生成学习的人脸图像年龄合成与分析
李佩佩
2021-05
页数150
学位类型博士
中文摘要

年龄是具有生物学基础的自然标志。千百年来,人类从未停止对年龄进程的
探索,各类人口现象均与年龄密切相关。作为计算机视觉领域的新兴方向之一,
人脸年龄合成与分析具有重要的理论意义与现实需求,例如安全领域的年龄估
计与跨年龄人脸识别,以及影视作品中的数字人脸老化与逆龄技术等。虽然相
关研究已经取得了一些进展,但人脸年龄合成与分析问题仍然面临着诸多挑战。
一方面,人脸老化过程会受多种因素影响,有遗传基因等内部因素,也有来自生
存环境的外部因素,具有很强不确定性。另一方面,非严格受控环境下的人脸姿
态、表情等复杂变化也是人脸年龄研究中不可回避的难点问题。本文以深度生成
学习为基础,针对上述问题,从人脸图像预处理、人脸图像年龄合成以及人脸图
像年龄分析三方面对人脸年龄展开研究。本文取得的研究成果如下:

1. 提出了两种基于生成学习的人脸预处理方法,即基于生成学习的人脸姿
态转正与基于对偶生成的人脸解析方法。为了探索第一个任务,本文建立了一个
包含 229 个人, 62 种俯仰姿态,共 79 万余张,像素值 1920×1080 的高清人脸
图片数据集。这是目前国内外公开数据集中姿态最多、数量最大、分辨率最高的
人脸图像姿态分析数据库。此外,本文在传统生成对抗网络基础上,提出了人脸
解析引导的局部判别网络,辅助提升人脸图像姿态转正效果。实验表明,该方法
可以在保证身份信息不丢失的情况下消除姿态干扰。在第二个任务中,本文提出
了一个对偶结构解耦变分生成网络,能够从无到有地生产出大量在真实世界不
存在的带人脸解析标签的虚拟人脸图像,通过将生成的图像加入到真实的训练
集中一起训练,达到提升人脸解析模型性能的目的。同时,为了更好地利用合成
数据,本文提出了标签容错算法。实验结果证明,本文提出的方法有利于提升人
脸解析模型的性能。
2. 提出了两种人脸图像年龄合成方法,即全局局部年龄一致的生成对抗网
络和全局局部年龄一致的小波域生成对抗网络。第一种模型针对人脸图像年龄
合成中局部纹理细节模糊问题,在传统生成对抗网络基础上,引入局部通路着重
处理年龄关键子区域,并使用身份保持损失和年龄准确性损失分别约束生成图
像的身份与年龄信息。该方法利用了全局拓扑优先感知,有助于提升人脸年龄合
成的局部细节质量。纹理信息属于高频信息,在图像处理领域,小波变换是一个
经典的将空域信息转换为频域信息的手段,因此,第二种模型将小波变换与生成
对抗网络结合,将人脸图像年龄合成问题转化为小波系数预测问题,有助于生成
更真实细腻的纹理信息。实验表明,通过引入局部通路与小波变换可以有效建模
局部老化细节与整体纹理老化趋势。
3. 提出了两种人脸图像年龄分析方法,即基于自适应标签分布学习的年龄
估计算法与基于解耦对抗变分自编码器的人脸图像年龄分析统一框架。由于人
脸老化是一个连续的过程,不同人种、性别、年龄段的老化速度是不同的,且分
布形式比较复杂,无法进行预先的假设或用统一的形式来表示。因此,本文提出
了一个基于自适应标签分布学习的年龄估计算法,包含两个联合的精炼过程:通
过标签分布精炼进行迭代学习来提升年龄分布标签预测的性能;通过松弛回归
精炼不断捕捉年龄标签之间的相关性,提升网络性能。在第二个任务中,本文首
次实现了通过训练一个网络,实现所有与年龄相关的子任务,包括年龄估计、年
龄合成、基于样例的人脸年龄合成与辅助跨年龄人脸识别。它使用一个解耦对抗
的变分编码器将输入图像解耦成身份、年龄,以及其余信息,并设计了新的年龄
先验、身份先验与其余信息先验指导模型实现解耦。为了增强身份、年龄信息的
学习,提出了条件异构生成器来重建输入图像。设计了条件自省对抗机制以进
一步提高生成图像质量。实验证明,该统一框架可以学习并解耦年龄相关信息,
从而实现多种与人脸年龄相关的子任务。
 

 

英文摘要

Age is a natural marker with a biological basis. For thousands of years, human beings have never stopped exploring the aging process, since all kinds of demographic phenomena are closely related to age. As one of the emerging directions in the field of computer vision, facial age synthesis and analysis has important theoretical significance and practical needs, such as age estimation and cross-age face recognition in the security field, and digital face aging and inverse ageing techniques in film and TV works. Although some progress has been made in related research, facial age synthesis and analysis still face many challenges. On one hand, the aging process of human faces is influenced by various factors, including internal factors such as genetics and external factors from the living environment, which are highly uncertain. On the other hand, the complex changes of face posture and expression in a non-strictly controlled environment also bring difficulties to face aging research. Based on deep generative learning, this paper investigates facial age from three aspects: facial image pre-processing, facial image age synthesis and facial image age analysis.
The research results obtained in this paper are as follows.


1. We propose two generative learning based facial image pre-processing methods, i.e., generative learning based facial pose frontalization and dual generation based face parsing methods. To explore the first task, we build a new large-scale multi-yaw multi-pitch high-quality database, containing 229 people, 62 pitch poses, and more than 790,000 images with a resolution of 1920x1080. To the best of our knowledge, M2FPA is the first publicly available database that contains precise and multiple yaw and pitch poses. In addition, we propose a simple yet effective parsing guided discriminator, which introduces parsing map as flexible attention to capture the local consistency during GAN optimization. In the second task, we propose a novel Dual-Structure Disentangling Variational Generation (D2VG) network. Benefiting from the interpretable factorized latent disentanglement in VAE, D2VG can learn a joint structural distribution of a facial image and its corresponding parsing map. Owing to these, it can synthesize large-scale paired face images and parsing maps from a standard Gaussian distribution. Then, we adopt both manually annotated and synthesized data to train a face parsing model in a supervised way. Since there are inaccurate pixel-level labels in synthesized parsing maps, we introduce a coarseness-tolerant learning algorithm to effectively handle these noisy or uncertain labels. In this way, we can significantly boost the performance of face parsing.


2. We propose two methods for facial age synthesis, i.e., Global and Local Consistent Age Generative Adversarial Network and Wavelet-domain Global and Local Consistent Age Generative Adversarial Network. Aging texture information is usually reflected in the local facial parts, such as wrinkles at the eye corners or on the forehead. We propose a global and local consistent age generator, in which three local specific networks and one global specific network are integrated together to capture both global and local texture information. Most current age synthesis methods only depend on modeling the aging process in the image-domain. Thus the outputs of these methods tend to be over-smoothed and lack textural details. Since the subtle texture information is more salient and robust in frequency-domain, we introduce wavelet transform to age synthesis.  In the second method, we combine wavelet transform with the generative adversarial network to transform the face image age synthesis problem into a wavelet coefficient prediction problem, which helps to generate more realistic and detailed texture information.

3. We propose two methods for facial image age analysis, i.e., age estimation algorithm with adaptive label distribution learning and a unified framework for facial age analysis based on disentangled adversarial autoencoder. However, since humans with different genders, races and any other situations may influence their facial aging appearances, age label distributions are often complicated and difficult to be modeled in a parameter way. In this paper, we propose a Label Refinery Network (LRN) with two concurrent refinery processes: label distribution refinery and slack regression refinery. Label refinery network aims to learn age label distributions progressively in an iterative manner. In this way, we can adaptively obtain the specific age label distributions for different facial images without making strong assumptions of fixed distribution formulations. To further utilize the correlations among age labels, we accordingly propose a slack regression refinery to convert the age label regression into the age interval regression. In the second task, we design a novel facial age prior to guide the aging mechanism modeling. To explore the age effects on facial images, we propose a Disentangled Adversarial Autoencoder (DAAE) to disentangle the facial images into three independent factors: age, identity and extraneous information. To avoid the "wash away" of age and identity information in the face aging process, we propose a hierarchical conditional generator by passing the disentangled identity and age embeddings to the high-level and low-level layers with class-conditional BatchNorm. Finally, a disentangled adversarial learning mechanism is introduced to boost the image quality for face aging. In this way, when manipulating the age distribution, DAAE can achieve face aging with arbitrary ages. Further, given an input face image, the mean value of the learned age posterior distribution can be treated as an age estimator. These indicate that DAAE can efficiently and accurately estimate the age distribution in a disentangling manner. DAAE is the first attempt to achieve facial age analysis tasks, including face aging with arbitrary ages, exemplar-based face aging and age estimation, in a universal framework.

关键词人脸预处理、人脸年龄合成、人脸年龄分析,生成学习,解耦表示
语种中文
七大方向——子方向分类生物特征识别
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/44887
专题模式识别实验室
推荐引用方式
GB/T 7714
李佩佩. 基于生成学习的人脸图像年龄合成与分析[D]. 自动化研究所智能化大厦1610. 中科院自动化研究所,2021.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[李佩佩]的文章
百度学术
百度学术中相似的文章
[李佩佩]的文章
必应学术
必应学术中相似的文章
[李佩佩]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。