CASIA OpenIR  > 毕业生  > 博士学位论文
知识与数据共同驱动的面部行为分析 与人脸卡通画合成
张勇1,2
学位类型工学博士
导师胡包钢
2018-05-31
学位授予单位中国科学院大学
学位授予地点北京
关键词先验知识 弱监督学习 人脸面部动作单元 卡通画合成 深度卷积神经网络 多示例学习 特征学习
其他摘要

本文的主要研究内容是利用领域相关的先验知识解决在数据标注不足时模型的训练问题,以及利用先验知识解决不同风格图像的匹配问题。 我们对人脸面部行为分析和人脸卡通画合成两个问题进行研究,并将领域相关的先验知识用于模型的学习以提升模型的效果。 本文的贡献包括如下几个方面:

(1) 基于双向有序多示例回归模型的人脸面部动作单元强度估计。 我们提出利用弱标注视频数据训练单帧动作强度回归模型。 每个视频段只需标注首尾帧的动作单元强度,这样可以减少标注成本。 我们从一个新视角看待动作单元的强度估计,并将动作强度估计建模为一个多示例回归问题。与以往方法不同,我们不仅引入“相关性”的概念,即每帧图像有分别相对于首帧标注和尾帧标注的相关程度值,而且提出每个袋拥有两个袋标注的多示例回归模型。 我们通过先验信息建立已标注图像和未标注图像之间的关系,并提供弱监督信息以辅助模型的学习,从而缓解模型对有限标注的过拟合。先验知识包括动作单元强度的平滑性、相关程度的有序性和相关程度的平滑性。

(2) 基于表征和回归模型联合学习的人脸面部动作单元强度估计。模型预测的准确度取决于图像的特征、图像标注和模型的参数。以往的强度估计方法分别进行特征学习和模型的学习。然而,学到的特征对于模型参数的学习可能并不是最优的。 另外,监督学习需要较多的标注,且动作单元强度的标注需要较强的领域知识,标注困难且成本较高。我们提出一个统一的框架,不仅可以嵌入多种不同类型的先验知识,而且能在少量的标注的情况下联合学习图像特征和模型参数。 先验知识包括特征的平滑性、强度的平滑性、强度的时序有序性和强度的非负性。先验知识可分为硬约束和软约束。 硬约束作为优化问题的约束,软约束作为目标函数中的正则项。我们基于交替方向乘子法优化框架设计一种有效的优化算法求解建立的模型。   

(3) 基于弱监督深度卷积神经网络的人脸面部动作单元强度估计。我们提出一种基于知识的弱监督深度学习方法用于动作强度估计。 该方法只需要极少量的标注数据进行模型学习。视频数据中只需要标注强度的峰值和谷值的位置,以及它们的动作单元强度值。为了提供弱监督信息,我们提取多种类型的先验信息,包括特征的相对相似性、强度的时序有序性、人脸对称性和与无表情脸的差异性。 我们基于五元组为各种先验知识设计可导的损失函数,以利用多帧图像间的更高阶的相关关系,而不是用单帧图像进行模型训练。

(4) 基于先验概率的面部动作单元分类器学习。 我们提出一种使用先验概率来训练动作单元分类器的方法。 该方法不需要任何动作单元的标注,只需要表情标注。 表情标注相对于动作单元的标注更加容易获得。 我们根据人脸解剖学和情感研究系统地提取与动作单元相关的先验概率, 包括表情相关和与表情无关的动作单元之间的关系。 我们为不同类型的先验概率定义不同的损失函数,并建立标注和模型联合学习的模型。 基于交替优化框架,我们提出优化算法迭代学习标注和模型参数。

(5) 数据与先验知识共同驱动的人脸卡通画合成。 我们提出一个多种风格人脸卡通画的自动生成系统,包括人脸解析、卡通器官的选择和卡通人脸的合成。 给定一张人脸图像,生成的卡通画不仅要保持与原人脸的相似性,而且具备美观性。 由于真实人脸和卡通人脸在不同的空间,直接利用纹理等特征度量相似性会导致图像匹配结果较差。 另外,卡通画的美观性难以直接量化。 我们利用先验知识,即人对不同空间的图像相似性的认知和人对美观的认知,为上述问题提供一种解决方案。我们将先验知识以数据标注的方式嵌入到数据集中,从而将直接匹配困难的问题转化为容易进行的间接匹配,而且从标注的数据中学习一个分布用于描述卡通人脸的美观性。

;

In this paper, we exploit domain specific prior knowledge to alleviate the consequence of the lack of annotations and to provide a strategy for similarity matching of images from two different domains.  We study two problems, i.e., facial behavior analysis and face cartoon synthesis.  We incorporate domain knowledge to improve the model learning to gain the performance.  The main contributions are summarized as follows:

(1) Bilateral ordinal relevance multi-instance regression for AU intensity estimation.  We propose a novel weakly supervised regression model, Bilateral Ordinal Relevance Multi-instance Regression (BORMIR), which learns a frame-level intensity estimator with weakly labeled sequences. From a new perspective, we introduce relevance to model sequential data and consider two bag labels for each bag.  The AU intensity estimation is formulated as a joint regressor and relevance learning problem.  Temporal dynamics of both relevance and AU intensity are leveraged to build connections among labeled and unlabeled image frames to provide weak supervision.  

(2) A framework of knowledge-assisted joint representation and estimator learning for AU intensity estimation. The performance of AU intensity estimation depends on image representation, intensity estimator, and supervisory information. Most existing methods focus on estimator learning by using fully annotated databases, regardless of representation learning and the difficulty of annotating a large database. We propose a novel general framework for AU intensity estimation, which can simultaneously learn the representation and estimator with limited annotations and, more importantly, can flexibly incorporate human knowledge, such as feature smoothness, label smoothness, label ranking, and positive label. Domain knowledge can be represented as soft and hard constraints, which are encoded as regularization terms and equality or inequality constraints respectively. We also propose an efficient algorithm for optimization based on Alternating Direction Method of Multipliers.

(3) Weakly-supervised deep convolutional neural network learning for AU intensity estimation. We propose a novel knowledge-based semi-supervised deep convolutional neural network for AU intensity estimation with extremely limited AU annotations. Only the intensity annotations of peak and valley frames in training sequences are needed. To provide additional supervision for model learning, we exploit naturally existing constraints on AUs, including relative appearance similarity, temporal intensity ordering, facial symmetry, and contrastive appearance difference. The knowledge builds connections between labeled and unlabeled samples. We propose to use 5-element tuples for model learning instead of individual frames or frame pairs, which leverages high-order relationships among multiple frames.

(4) Classifier learning with prior probabilities for AU recognition. We propose a knowledge driven method for jointly learning multiple AU classifiers without any AU annotation by leveraging prior probabilities on AUs, including expression-independent and expression-dependent AU probabilities. These prior probabilities are drawn from facial anatomy and emotion studies, and are independent of datasets. We incorporate the prior probabilities on AUs as the constraints into the objective function of multiple AU classifiers, and develop an efficient learning algorithm to solve the formulated problem.

(5) Knowledge and data driven face cartoon synthesis. We design a system to automatically generate cartoon faces in different styles, including face parsing, cartoon component selection, and cartoon component composition. Given a portrait, the stylized cartoon face should keep the characteristics of the portrait and ensure the attractiveness. Since the poor performance of similarity matching with using texture- or color-based features directly and the difficulty of the quantification of attractiveness, we exploit human knowledge on the similarity between images in different domains and the knowledge on the attractiveness of cartoon face to bridge the gap. The knowledge is encoded into the data through human annotation. Then the similarity matching can be achieved indirectly and the distribution of attractiveness can be learned from the data. 

 
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/20979
专题毕业生_博士学位论文
作者单位1.中国科学院自动化研究所
2.中国科学院大学
推荐引用方式
GB/T 7714
张勇. 知识与数据共同驱动的面部行为分析 与人脸卡通画合成[D]. 北京. 中国科学院大学,2018.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
Thesis_yongzhang_fin(11332KB)学位论文 暂不开放CC BY-NC-SA请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[张勇]的文章
百度学术
百度学术中相似的文章
[张勇]的文章
必应学术
必应学术中相似的文章
[张勇]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。