Other Abstract | Psychological problems have become an important issue affecting the quality of life and productity of modern people. Finding the psychological problems of employees in time has become an important method for enterprises to improve production efficiency and reduce hidden dangers. However, due to some subjective and objective factors, psychological problem investigation methods are still costly and inefficient. As an important way of human emotion analysis, facial expression recognition can analyze and monitor employees' psychological status accurately without intrusion. It is a more advanced psychological investigation method. However, most expression recognition systems are only suitable for environments with fixed lighting and fixed angles. In natural scenes where face posture, lighting, scale, occlusion, and identity information are uncontrollable, the existing facial expression recognition algorithm’s accuracy will decrease. How to implement efficient and stable facial expression recognition algorithms in natural scenes has become an important research topic.
The attention mechanism mimics the way the human brain processes external information. By focusing on a subset of the input content and using limited computing resources to quickly filter out high-value information from a large amount of information, attention mechanism is widely used in image classification, semantic segmentation, and image understanding and other computer vision tasks with remarkable results. The characteristics of the attention mechanism make it one of the most important ways to improve the accuracy of expression recognition algorithms.
Our team cooperates with a well-known psychology application technology company to apply expression recognition to non-invasive and automated psychological state detection for employees based on the connection between facial expressions and human psychology. This article focuses on the key issues of expression recognition based on the attention mechanism. The main work and innovations are summarized as follows:
1. An expression recognition method based on fixed attention is proposed
We propose a method that uses traditional expression recognition method to obtain facial feature points, and then uses the feature points as attention points to guide deep neural networks for expression recognition. Faces in images are often clear and complete, and traditional image processing methods can quickly and stably perform face detection and feature points marking. Faces and feature points of human faces are detected by traditional image processing methods, then feature points are used to generate heat maps and texture features around the feature points are extracted as feature maps, and then the heat maps and feature maps are combined into feature heat maps. Finally, a neural network is used to classify expressions from the feature heat map. By combining traditional methods and deep learning methods, the effects of illumination and scale changes can be eliminated effectively, and the robustness and accuracy of the algorithm can be improved. On the data set we collected, the accuracy of our method is 10% higher than the accuracy of commercial software, reaching 69%.
2. An expression recognition method based on global attention is proposed
The expression recognition method based on fixed attention points relies on fixed feature points. These features are proposed by researchers based on experience, and may not be the optimal feature points for expression recognition. In addition, the face angle and scale are variable in natural scenes, once the working scene is changed, the feature point detection algorithm may fail. Finally yet importantly, only the image information around the key points are used, while the global image information cannot be used. Therefore, we propose a global attention model, which uses WGAN to generate reference expression of the sample to be tested, and then extracts differential features to find texture features related to expression in global context. Finally, these texture features are used for facial expression recognition. This method effectively removes features from facial contours, hair, and glasses, and improves the accuracy of the algorithm. In public natural face facial expression datasets AffectNet and RAF-DB, the accuracy of the algorithm we proposed exceeds that proposed by other researchers, reaching 55.0% and 83.5%, respectively.
3. An expression recognition method based on adaptive attention is proposed
Global attention-based expression recognition method generates reference expressions through GAN. However, GAN can only fit the distribution of training set. When the difference between test set and training set is too large, the reference expression generation effect will be worse, which will affect the accuracy of expression recognition. In order to improve the generalization ability of expression recognition algorithms for natural scenes, we propose a novel network model called "dimension reduction network". This network model pre-digests a dimension reduction module on a general classification network to perform dimension reduction on input pictures. The dimension reduction operation forces the neural network to discard some features. The classification error generated by the classification network will affect the dimension reduction module through gradient back propagation, and guide the dimension reduction module to learn the features related to expressions. This module can not only reduce the size of the feature map to reduce the amount of calculation and improve the calculation speed, but also implicitly give weight to the importance of each pixel. As a result, invalid features are removed and useful information in the picture is retained as much as possible at the same time. The dimension reduction module can be placed in front of any general classification network. Even if the classification network is large, it can effectively reduce the generalization error of the network and improve classification accuracy. On AffectNet dataset, our dimension reduction network obtained the best accuracy rate, surpassing the second one by 1.2 percentage points. |
Edit Comment