基于深度学习的人脸检测及表情识别方法研究

CASIA OpenIR > 毕业生 > 博士学位论文

	基于深度学习的人脸检测及表情识别方法研究
	武文琦1,2
	2018-05-25
学位类型	工学博士
英文摘要	人脸检测和表情识别是人机交互中的关键技术，并且在众多领域有着广泛的应用前景。近年来，随着深度学习方法在人脸相关领域的不断发展，人脸检测和表情识别技术受到了研究者们的广泛关注，已经成为计算机视觉和模式识别领域的热点研究课题。随着实际应用需求的不断增加，人脸检测及表情识别仍然面临复杂场景下的诸多挑战，如人脸姿态变化、光照变化、尺度变化、遮挡变化和身份信息变化等。这些非受控场景下的复杂情况会导致人脸检测和表情识别出现稳定性差的问题，从而降低人脸技术的应用价值。如何实现高性能、高效的人脸检测与表情识别成为了一个重要的研究课题。本文针对基于深度学习的人脸检测及表情识别的关键问题展开研究。主要的工作和贡献有：（1）提出了一种基于区域卷积神经网络Faster R-CNN的人脸检测方法。利用深度卷积神经网络对小尺度人脸提取特征时，会使得该特征具有较强的语义信息表达能力，但是特征的分辨率太低，从而导致产生人脸检测错误。为了解决人脸检测中的小目标人脸和多尺度问题，本文提出了分步式的人脸检测方法。该方法分为两个阶段：第一阶段，提出了一种高效的基于级联Boosting人脸检测器的多任务RPN网络，以提高人脸候选区域的提取效率和回召率。第二阶段，提出了一种基于人脸候选区域尺度的并联式Fast R-CNN网络，针对不同候选区域的尺度进行分组，分别利用三个对应的Fast R-CNN网络进行检测，实现了针对人脸目标尺度特性的人脸检测，有效地提高了人脸检测精度。（2）提出了一种基于高效卷积神经网络、滤波器剪枝和二值化网络参数相结合的网络压缩和加速方法。由于深度卷积神经网络的参数量和计算量过大，导致人脸检测方法的应用范围受到了限制。为了解决人脸检测中的检测速度问题，提出了三种网络压缩和加速方法以及融合策略：基于分组点卷积的高效卷积神经网络，从网络结构本身减小参数量；基于近似Hessian矩阵的滤波器剪枝方法，利用求取的Hessian矩阵估计低敏感度的滤波器并将其剪枝，从而有效降低内存占用并且提高网络的前向传播速度；基于二值化的网络参数化简方法，通过减少表示每个权重所需的比特数来压缩原始网络。进而，利用Faster R-CNN分步式的检测框架将融合后的加速网络应用到人脸检测任务中。通过消除实验验证了多种网络压缩和加速策略可以有效地进行融合，使得网络获得速度与精度之间更好的平衡点。（3）提出了一种基于特殊关键点定位的多姿态表情识别方法。由于人脸可以视作一种凸球结构，因此人脸的姿态会导致自遮挡现象的产生，使得人脸的表情特征产生差异性，从而影响表情识别的精度。为了解决人脸表情识别中的姿态问题，提出了基于卷积神经网络的特殊关键点定位方法，利用特殊关键点之间的几何关系估计人脸的姿态。提出基于人脸姿态的感兴趣区域投影和特征图拼接方法，使得不同的人脸姿态对应不同的特征图拼接权重，以实现表情识别网络对姿态的自适应性。提出基于类内距离和类间距离的损失函数，在减小样本特征与类别中心类内距离的同时增大类间距离，进而增强不同表情特征之间的区分性。（4）提出了基于身份信息增强的表情识别方法。在表情识别中，身份信息的改变会导致表情识别的混淆，不仅使得相同表情之间具有较大的差异性，同时不同表情之间又具有一定的相似性。为了解决表情识别中由身份信息改变导致的识别率下降问题，提出在表情特征的监督学习过程中利用身份信息来增强其判别性的方法，以实现表情识别网络对于不同的身份信息的自适应性。提出通过空间融合将身份信息与表情特征进行有效地融合，再利用基于约束的多任务学习来增强包含在表情特征中的身份信息特征。该方法将身份信息融合到表情识别任务中，有效地提高了表情识别的准确率。 ; Face detection and facial expression recognition are the essential parts of human-computer interaction and have a wide range of application prospects in many fields. In recent years, with the continuous development of deep learning methods in the face-related fields, face detection and facial expression recognition technologies are extensively concerned by researchers and have become a hot research topic in the field of computer vision and pattern recognition. With the increasing demand for practical applications, face detection and facial expression recognition still confront many challenges in complex scenarios such as face pose changes, lighting changes, scale changes, occlusion changes, and identity information changes. Complex situations in these unconstrained environment can lead to poor stability in face detection and facial expression recognition, thereby reducing the application value of face technology. How to achieve high performance and efficient face detection and facial expression recognition has become an important research topic. This article focuses on the key issues of face detection and facial expression recognition based on deep learning.The main work and contributions are as follows: (1) A method of face detection based on the regional convolutional neural network Faster R-CNN is proposed. When using deep convolutional neural network to extract features of small scale face, it will have strong ability of expressing semantic information, but the resolution of these features is too low, which will lead to face detection errors. In order to solve the small and multi-scale face problems in face detection, this paper proposes a step-by-step face detection method. The method is divided into two stages: In the first stage, an efficient multi-task RPN network based on cascaded Boosting face detector is proposed to improve the extraction efficiency and recall rate of face proposals. In the second stage, a parallel-type Fast R-CNN network based on the scale of proposal is proposed. The different face proposals are grouped according to the scale, and three corresponding Fast R-CNN networks are used for detection. This method realizes the face detection bsased on the scale of face and effectively improves face detection accuracy. (2) A network compression and acceleration method based on the combination of efficient convolutional neural network, filter pruning and binarization network parameters is proposed. Due to the large amount of parameters and computation of the deep convolutional neural network, the application range of the face detection method is limited. In order to solve the problem of detection speed in face detection, three network compression and acceleration methods and their fusion strategies are proposed. A high-efficiency convolutional neural network based on group-point convolution is used to simplify the parameter from the network structure itself. The filter pruning method based on the approximate Hessian matrix is proposed. The Hessian matrix is used to estimate the low-sensitivity filter and prune it, which can effectively reduce the memory occupation and improve the forward propagation speed of the network. A simplified method based on binarization network parameter is used to compress the original network by reducing the number of bits needed to represent each weight. Using the Fraser R-CNN step-by-step detection framework, the fused acceleration network is applied to face detection tasks. Through elimination experiments, it is verified that a variety of network compression and acceleration strategies can be effectively combined, and making the network achieve a better balance between speed and accuracy. (3) A multi-pose expression recognition method based on special landmark detection is proposed. As the face can be regarded as a kind of convex spherical structure, the pose of the human face will lead to the occurrence of self-occlusion, which will make the facial expression features different, and thus affect the accuracy of facial expression recognition. In order to solve the pose problem, a special landmark detection method based on convolutional neural network is proposed. The geometric relationship between special landmarks is used to estimate the pose of the human face. The projection method of ROI and concatenation method of feature maps based on face pose is proposed, which makes different face poses correspond to different feature map concatenation weights. So that the expression recognition network has self-adaptive ability for pose. The loss function based on intra-class distance and inter-class distance is proposed to increase the distance between classes while reducing the distances between the sample features and the class center. As the result, the distinction between different expression features is enhanced. (4) A facial expression recognition method based on identity information enhancement is proposed. The change of identity information will lead to the confusion of expression recognition, which not only makes the same expression have greater differences, but also brings certain similarities between different expressions. In order to solve the problem of decrease in facial expression recognition rate caused by the change of identity information, a method using identity information to enhance the discriminability in the process of supervised learning of facial expressions is proposed to realize the adaptability of the facial expression recognition network to different identity information. It is proposed that the identity information and facial expression features are effectively combined by spatial fusion, and then the multi-task learning based on constraints is used to enhance the identity information contained in the facial expression features. The identity information is fused into expression recognition tasks in this method, which can effectively improve the accuracy of face expression recognition.
关键词	深度学习人脸检测表情识别网络压缩深度卷积神经网络人脸关键点定位多任务学习
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/20990
专题	毕业生_博士学位论文
作者单位	1.中国科学院自动化研究所 2.中国科学院大学
第一作者单位	中国科学院自动化研究所
推荐引用方式 GB/T 7714	武文琦. 基于深度学习的人脸检测及表情识别方法研究[D]. 北京. 中国科学院研究生院,2018.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
基于深度学习的人脸检测及表情识别方法研究（6529KB）	学位论文		限制开放	CC BY-NC-SA