CASIA OpenIR  > 毕业生  > 博士学位论文
人脸合成中的去几何和光照变化研究
张婷1,2
学位类型工学博士
导师胡占义
2018-05-29
学位授予单位中国科学院研究生院
学位授予地点北京
关键词人脸识别 人脸合成 角度和光照校正 人脸去遮挡 深度神经网络
摘要
如何有效地去除人脸图像中的几何、光照变化进而合成标准光照下的正面人脸图像,是人脸识别研究中一个亟待解决的问题。本文针对该问题展开了系统性的研究,提出了多种基于深度神经网络的人脸合成方法。本文的主要工作和贡献包括以下几个方面:
1. 提出了一种能够去除人脸图像中几何和光照变化的深度神经网络,旨在学习具有视角不变性的人脸特征表达,并合成正面人脸图像。该网络包含两个串联的模块:第一个模块用来学习输入人脸图像对应的姿态;第二个模块包含多个子网络,其中每个子网络的输入为对应于一个特定视角、不同光照条件的人脸图像,输出为与输入图像相对应的标准光照下的正面人脸图像。该网络需要较少的计算资源,而且只需要较少的训练数据就能够学习到人脸图像的由视角变化引起的几何变化。在基准数据集Mutli-PIE上的对比实验结果表明,该网络的性能优于文献中的四种方法。
2. 提出了一种两分支的深度相关网络,旨在融合几何和纹理特征从而实现正面人脸合成。该网络包含一个几何分支网络和一个纹理分支网络,网络的输入为任意视角下的单幅人脸图像:首先,利用这两个分支网络分别提取输入图像的几何特征和纹理特征。然后,引入一个相关层融合这两种特征。最后,利用融合后的特征合成标准光照下的正面人脸图像。在基准数据集Multi-PIE和LFW上的实验结果表明,该两分支深度相关网络的性能优于文献中八种当前流行的方法。
3. 提出了一种深度分解连体网络(Deep Disentangling Siamese Network),旨在利用图像对来合成标准光照下的正面人脸图像。不同于其他相关文献将单幅图像作为网络的输入图像,该网络将一对具有随机身份、角度和光照的人脸图像作为网络的输入图像,将正面人脸合成问题建模为一个关于人脸图像的编码-分解-解码的过程:首先,连体网络中的编码模块学习到输入图像对的特征表达。然后,分解模块将这对人脸特征表达分别分解为代表身份、角度和光照信息的特征表达。最终,解码模块将身份表达转换为正面人脸图像。在基准数据集Multi-PIE上的定性和定量实验结果表明,该深度连体分解网络的性能优于文献中九种当前流行的方法。
4. 提出了一种能够去除人脸图像中角度变化和遮挡的生成对抗网络(Generative Adversarial Network),旨在从遮挡的人脸图像中恢复对应的正面人脸图像。该生成对抗网络使用一个编码器-解码器的网络结构作为生成器,并同时引入两个判别器:一个是全局判别器,用于区分整张人脸图像的真实性,同时保持身份特征不变;另一个是局部判别器,用于区分人脸中遮挡区域的真实性。此外,引入一个人脸语义分割网络强化生成图像中人脸五官的一致性。在基准数据集Multi-PIE上的定性和定量实验结果表明,该生成对抗网络的性能优于文献中九种当前流行的方法。
其他摘要
How to effectively remove view-point and illumination changes in face images and then synthesize frontal face images under neutral illumination is a challenging problem in face recognition. To address this problem, we propose several DNN(Deep Neural Network)-based methods for face synthesis in this thesis and our main contributions include:
1. A novel deep neural network which can remove geometric and illumination changes in face images, is proposed to learn view-invariant facial representations and then synthesize frontal face images. The proposed network consists of two modules: the first one is to learn the corresponding poses of the input face images, while the other one consists of several sub-networks. The inputs of each sub-network are face images in the same pose under different illuminations. The outputs are the corresponding frontal face images under the neutral illumination. This network does not consume much computational resources. Moreover, the training process only requires a small amount of data. Comparative experimental results on the benchmark dataset Multi-PIE demonstrate that the proposed network outperforms four existing methods.
2. A two-stream deep correlation network is proposed to incorporate both geometric and textural characteristics for frontal face synthesis. The proposed network consists of a geometric branch and a textural branch: Given an input face image with a random pose, these two branches first extract its geometric features and textural features respectively; Then, these features are integrated through a correlation layer; Finally, the integrated feature is used to synthesize the corresponding fontal image under the neural illumination. The evaluation on the benchmark datasets Multi-PIE and LFW demonstrate the effectiveness of the proposed network compared with eight state-of-the-art methods.
3. A deep disentangling siamese network (DSN) is proposed to synthesize frontal face images under neutral illumination. Different from the existing methods taking single image as input, the proposed network DSN takes a pair of images as inputs with random identities, poses and illuminations. The frontal image synthesis problem is formulated as an encoder-disentangling-decoder process. Firstly, the encoder in DSN is used to learn the representation of the input pair of images. Then, the disentangling model in DSN is to disentangle the obtained representation into three representation vectors for characterizing the identity, pose, and illumination of the inputs. Finally, the decoder in DSN transforms the identity representation into the output frontal image. Quantitative and qualitative evaluations on the benchmark dataset Multi-PIE demonstrate that the proposed network DSN performs better than nine state-of-the-art methods.
4. A generative adversarial network is proposed to jointly remove occlusion and pose changes from the input facial images, and then to synthesize the corresponding image to the input facial image. In the proposed network, the generator has an encoder-decoder structure, and two discriminators are employed: one is a global discriminator for both discriminating the input image and preserving its identity, while the other one is a local discriminator for discriminating the local occlusion region. Besides, an extra face parsing network is incorporated to impose the consistency of the facial features. Quantitative and qualitative evaluations on the benchmark dataset Multi-PIE demonstrate that the proposed network outperforms nine state-of-the-art methods.
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/21060
专题毕业生_博士学位论文
作者单位1.中国科学院自动化研究所
2.中国科学院大学
推荐引用方式
GB/T 7714
张婷. 人脸合成中的去几何和光照变化研究[D]. 北京. 中国科学院研究生院,2018.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
论文最终版_IR.pdf(5838KB)学位论文 暂不开放CC BY-NC-SA请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[张婷]的文章
百度学术
百度学术中相似的文章
[张婷]的文章
必应学术
必应学术中相似的文章
[张婷]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。