|关键词||先验知识 弱监督学习 人脸面部动作单元 卡通画合成 深度卷积神经网络 多示例学习 特征学习|
In this paper, we exploit domain specific prior knowledge to alleviate the consequence of the lack of annotations and to provide a strategy for similarity matching of images from two different domains. We study two problems, i.e., facial behavior analysis and face cartoon synthesis. We incorporate domain knowledge to improve the model learning to gain the performance. The main contributions are summarized as follows:
(1) Bilateral ordinal relevance multi-instance regression for AU intensity estimation. We propose a novel weakly supervised regression model, Bilateral Ordinal Relevance Multi-instance Regression (BORMIR), which learns a frame-level intensity estimator with weakly labeled sequences. From a new perspective, we introduce relevance to model sequential data and consider two bag labels for each bag. The AU intensity estimation is formulated as a joint regressor and relevance learning problem. Temporal dynamics of both relevance and AU intensity are leveraged to build connections among labeled and unlabeled image frames to provide weak supervision.
(2) A framework of knowledge-assisted joint representation and estimator learning for AU intensity estimation. The performance of AU intensity estimation depends on image representation, intensity estimator, and supervisory information. Most existing methods focus on estimator learning by using fully annotated databases, regardless of representation learning and the difficulty of annotating a large database. We propose a novel general framework for AU intensity estimation, which can simultaneously learn the representation and estimator with limited annotations and, more importantly, can flexibly incorporate human knowledge, such as feature smoothness, label smoothness, label ranking, and positive label. Domain knowledge can be represented as soft and hard constraints, which are encoded as regularization terms and equality or inequality constraints respectively. We also propose an efficient algorithm for optimization based on Alternating Direction Method of Multipliers.
(3) Weakly-supervised deep convolutional neural network learning for AU intensity estimation. We propose a novel knowledge-based semi-supervised deep convolutional neural network for AU intensity estimation with extremely limited AU annotations. Only the intensity annotations of peak and valley frames in training sequences are needed. To provide additional supervision for model learning, we exploit naturally existing constraints on AUs, including relative appearance similarity, temporal intensity ordering, facial symmetry, and contrastive appearance difference. The knowledge builds connections between labeled and unlabeled samples. We propose to use 5-element tuples for model learning instead of individual frames or frame pairs, which leverages high-order relationships among multiple frames.
(4) Classifier learning with prior probabilities for AU recognition. We propose a knowledge driven method for jointly learning multiple AU classifiers without any AU annotation by leveraging prior probabilities on AUs, including expression-independent and expression-dependent AU probabilities. These prior probabilities are drawn from facial anatomy and emotion studies, and are independent of datasets. We incorporate the prior probabilities on AUs as the constraints into the objective function of multiple AU classifiers, and develop an efficient learning algorithm to solve the formulated problem.
(5) Knowledge and data driven face cartoon synthesis. We design a system to automatically generate cartoon faces in different styles, including face parsing, cartoon component selection, and cartoon component composition. Given a portrait, the stylized cartoon face should keep the characteristics of the portrait and ensure the attractiveness. Since the poor performance of similarity matching with using texture- or color-based features directly and the difficulty of the quantification of attractiveness, we exploit human knowledge on the similarity between images in different domains and the knowledge on the attractiveness of cartoon face to bridge the gap. The knowledge is encoded into the data through human annotation. Then the similarity matching can be achieved indirectly and the distribution of attractiveness can be learned from the data.
|张勇. 知识与数据共同驱动的面部行为分析 与人脸卡通画合成[D]. 北京. 中国科学院大学,2018.|