CASIA OpenIR  > 毕业生  > 硕士学位论文
基于注意力机制的自然场景表情识别技术研究
胡申华
Subtype硕士
Thesis Advisor顾庆毅
2020-05
Degree Grantor中国科学院自动化研究所
Place of Conferral中国科学院自动化研究所
Degree Name工学硕士
Degree Discipline控制理论与控制工程
Keyword深度学习 表情识别 注意力机制 特征降维
Abstract

    心理问题已经成为影响现代人生产效率和生活质量的重要问题。及时发现并疏导员工的心理问题成为企业提高生产效率、减少生产隐患的重要手段。但是由于主客观因素的影响,心理问题调查手段依然成本高昂且效率低下。表情识别作为人类情感分析的重要途径,可以无侵入、准确地对员工的心理状况进行分析和监测,是一种更为先进的心理学调查方式。但是,大部分表情识别系统只适用于固定光照、固定角度的环境。当处在人脸姿态、光照、尺度、遮挡和身份信息不可控的自然场景中,现有的人脸表情识别算法会出现识别率降低的问题。如何实现在自然场景中高效、稳定的人脸表情识别算法成为了一个重要的研究课题。
注意力机制模仿人类大脑处理外界信息的方式,通过专注于输入内容的一个子集从而利用有限的计算资源从大量信息中快速筛选出高价值信息,被广泛地应用在图像分类、语义分割和图像理解等计算机视觉任务中,并取得了显著的成果。注意力机制的特点使得它成为提升表情识别算法准确率的重要方式之一。
本课题组与国内知名心理学应用技术公司合作,利用表情与心理之间的联系,将表情识别应用在针对员工的非侵入、自动化的心理状态检测中。本文主要针对基于注意力机制的人脸表情识别关键问题展开研究,主要的工作和创新点总结如下:
(1)提出了一种基于固定注意点的表情识别方法
    本文提出了一种使用传统图像处理技术得到人脸特征点,然后利用特征点作为注意焦点引导深度学习网络进行表情识别的技术。用于表情识别的图片通常清晰完整,很适合利用传统图像处理方法进行快速、稳定的人脸检测和特征点标记。接着利用特征点生成热力图并提取特征点周围的纹理特征作为特征图,然后将热力图和特征图融合成特征热力图。最后使用神经网络进行多信息融合的表情分类。通过结合传统方法和深度学习方法,能有效消除光照和尺度变化对表情识别的影响,提升算法的稳定性和准确性。在我们采集的数据集上,我们的方法的准确率比商业软件的准确率提升了10个百分点,达到了69%。
(2)提出了一种基于全局注意力的表情识别方法
    基于固定注意点的表情识别技术依赖于事先设定的特征点,而这些特征点是研究者根据经验提出来的,对于表情识别不一定最优。此外,只能利用关键点周围的图像信息,无法利用全部图像信息。因此本文提出了全局注意力模型,利用参考表情生成模块生成待测表情的参考表情,接着通过对比待测表情和参考表情的特征获得差分特征,最后利用差分特征进行表情识别。通过这个方法,有效地去除了面部轮廓、头发、眼镜等外物的干扰,提升了算法的精度。在公开的自然场景人脸表情数据集AffectNet和RAF-DB上,我们提出的算法的准确率超越了大部分研究者,分别达到55.0%和83.5%。
(3)提出了一种基于自适应注意力的表情识别方法
    基于全局注意力的表情识别技术利用WGAN生成参考表情。但是WGAN只能拟合训练集的数据分布规律,当待测样本与训练集中的样本差别过大时,会导致参考表情生成效果变差,影响表情识别精度。为提高表情识别算法的泛化能力,我们提出“降维网络”的网络模型,这个网络模型主要在通用的分类网络上前置降维模块,对输入图片进行降维操作。通过降维操作迫使神经网络舍弃一部分特征,而保留有价值的特征。而分类网络产生的分类误差将通过梯度反传影响降维模块,指导降维模块准确获取与表情相关的特征。该模块既能减少特征图的大小来减少计算量,提升计算速度,又能隐式地对每一个像素的重要程度赋予权重,在去除无效特征的同时尽可能保留图片中的有用信息。该降维模块通用性较强,可以前置于任何通用的分类网络。即使分类网络规模较大,也能够有效地减少网络的泛化误差,提升分类准确率。在AffectNet数据集上,我们提出的降维网络得到了最优的准确率,超越了第二名1.2个百分点。

Other Abstract

Psychological problems have become an important issue affecting the quality of life and productity of modern people. Finding the psychological problems of employees in time has become an important method for enterprises to improve production efficiency and reduce hidden dangers. However, due to some subjective and objective factors, psychological problem investigation methods are still costly and inefficient. As an important way of human emotion analysis, facial expression recognition can analyze and monitor employees' psychological status accurately without intrusion. It is a more advanced psychological investigation method. However, most expression recognition systems are only suitable for environments with fixed lighting and fixed angles. In natural scenes where face posture, lighting, scale, occlusion, and identity information are uncontrollable, the existing facial expression recognition algorithm’s accuracy will decrease. How to implement efficient and stable facial expression recognition algorithms in natural scenes has become an important research topic.
The attention mechanism mimics the way the human brain processes external information. By focusing on a subset of the input content and using limited computing resources to quickly filter out high-value information from a large amount of information, attention mechanism is widely used in image classification, semantic segmentation, and image understanding and other computer vision tasks with remarkable results. The characteristics of the attention mechanism make it one of the most important ways to improve the accuracy of expression recognition algorithms.
Our team cooperates with a well-known psychology application technology company to apply expression recognition to non-invasive and automated psychological state detection for employees based on the connection between facial expressions and human psychology. This article focuses on the key issues of expression recognition based on the attention mechanism. The main work and innovations are summarized as follows:
1. An expression recognition method based on fixed attention is proposed
We propose a method that uses traditional expression recognition method to obtain facial feature points, and then uses the feature points as attention points to guide deep neural networks for expression recognition. Faces in images are often clear and complete, and traditional image processing methods can quickly and stably perform face detection and feature points marking. Faces and feature points of human faces are detected by traditional image processing methods, then feature points are used to generate heat maps and texture features around the feature points are extracted as feature maps, and then the heat maps and feature maps are combined into feature heat maps. Finally, a neural network is used to classify expressions from the feature heat map. By combining traditional methods and deep learning methods, the effects of illumination and scale changes can be eliminated effectively, and the robustness and accuracy of the algorithm can be improved. On the data set we collected, the accuracy of our method is 10% higher than the accuracy of commercial software, reaching 69%.
2. An expression recognition method based on global attention is proposed
The expression recognition method based on fixed attention points relies on fixed feature points. These features are proposed by researchers based on experience, and may not be the optimal feature points for expression recognition. In addition, the face angle and scale are variable in natural scenes, once the working scene is changed, the feature point detection algorithm may fail. Finally yet importantly, only the image information around the key points are used, while the global image information cannot be used. Therefore, we propose a global attention model, which uses WGAN to generate reference expression of the sample to be tested, and then extracts differential features to find texture features related to expression in global context. Finally, these texture features are used for facial expression recognition. This method effectively removes features from facial contours, hair, and glasses, and improves the accuracy of the algorithm. In public natural face facial expression datasets AffectNet and RAF-DB, the accuracy of the algorithm we proposed exceeds that proposed by other researchers, reaching 55.0% and 83.5%, respectively.
3. An expression recognition method based on adaptive attention is proposed
Global attention-based expression recognition method generates reference expressions through GAN. However, GAN can only fit the distribution of training set. When the difference between test set and training set is too large, the reference expression generation effect will be worse, which will affect the accuracy of expression recognition. In order to improve the generalization ability of expression recognition algorithms for natural scenes, we propose a novel network model called "dimension reduction network". This network model pre-digests a dimension reduction module on a general classification network to perform dimension reduction on input pictures. The dimension reduction operation forces the neural network to discard some features. The classification error generated by the classification network will affect the dimension reduction module through gradient back propagation, and guide the dimension reduction module to learn the features related to expressions. This module can not only reduce the size of the feature map to reduce the amount of calculation and improve the calculation speed, but also implicitly give weight to the importance of each pixel. As a result, invalid features are removed and useful information in the picture is retained as much as possible at the same time. The dimension reduction module can be placed in front of any general classification network. Even if the classification network is large, it can effectively reduce the generalization error of the network and improve classification accuracy. On AffectNet dataset, our dimension reduction network obtained the best accuracy rate, surpassing the second one by 1.2 percentage points.

Pages69
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/39106
Collection毕业生_硕士学位论文
Corresponding Author胡申华
Recommended Citation
GB/T 7714
胡申华. 基于注意力机制的自然场景表情识别技术研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2020.
Files in This Item:
File Name/Size DocType Version Access License
毕业论文反馈修改3.pdf(4077KB)学位论文 限制开放CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[胡申华]'s Articles
Baidu academic
Similar articles in Baidu academic
[胡申华]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[胡申华]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.