（1）针对现有研究在小规模、少类别数据集上训练的模型难以应用于更为多样化的情绪相关肢体动作的问题，提出了一种基于广义零样本学习（Generalized Zero-Shot Learning，GZSL）的肢体动作情绪识别方法，即借助人工设计的语义表示来实现对未经训练的类别的预测。为充分利用肢体动作信息，该方法将情绪类别视为多个肢体动作的集合，并据此构建了可同时满足动作标签和情绪标签双重约束的网络结构。具体地，本文提出了包含分层原型网络（Hierarchical Prototype Network，HPN）和语义自动编码器（Semantic Auto-Encoder，SAE）双分支的广义零样本学习网络，分别用于预测已训练和未训练类别的样本。其中，分层原型网络分支借助情绪和动作的先验关系，依次学习动作和情绪两个层次的原型中心，以增强动作类别的可分性和情绪类别的类内相似性；语义自动编码器分支通过学习从特征空间到语义空间的映射关系，借助包含情绪和动作信息的语义表示实现对未训练类别样本的预测。在MASR公共数据集上的实验结果表明了所提方法优于现有的GZSL算法和仅使用单种标签的基线方法。
（2）针对仅具有少量标注样本的目标群体与可获取大量训练数据的普通群体之间存在数据分布差异的问题，提出了一种面向生理信号压力检测任务的域混合对抗迁移学习（Adversarial Transfer Learning with Domain Mixup）算法。该算法通过学习与群体差异无关的特征，实现了由普通人群到目标群体的压力检测知识迁移。具体地，在包含特征提取器、域判别器和压力检测器的对抗迁移网络的基础上，构建了特征层面的域混合样本来增强域判别器的泛化性。其中，特征提取器与域判别器的优化目标相互对抗，能够学习具有领域不变性的特征；三个模块共同训练，使得对抗学习生成的特征不会对压力检测任务的性能产生负面影响。在此基础上，针对压力检测任务中标签分布不均匀的问题，提出了基于类别先验概率估计的损失函数修正方法，有利于提高训练样本较少的高压力类别的识别性能。因目前仍没有针对特定目标群体的公共数据集，本文构建了一个包含普通人群和警校学生群体的生理信号数据集进行实验验证，结果表明了所提算法优于非迁移学习基线算法和具有代表性的迁移学习算法。
（3）针对现有个性化模型训练通常需要获取待测被试（Subject）大量样本的问题，提出了一种基于孪生网络（Siamese Network）的个体情绪与压力检测方法。该方法通过构建相对强度回归模型来对样本对之间的相对差异性进行建模，可仅采用单个有标签的基线样本实现个体校准。在此基础上，为了更好地利用通过数据分割得到的片段样本进行网络训练，构造了一个新的强度排序子任务来对回归任务进行辅助，并针对生理信号数据的特性设计了排序规则和排序样本对构建方法。该强度排序子任务借助片段样本对的相对强弱监督信息进行逐对排序（Pairwise Ranking），以增强所提特征对于情绪或压力相对强度的表征能力。上述两个子任务共用孪生网络提取深层特征，并采用循环交替的方式进行训练。在自采集压力数据集和DEAP公共情绪数据集上的实验结果表明，该方法优于单样本校准的基线方法，且与现有研究中使用单被试的多个样本训练的个性化模型性能相近。
With the development and progress of society, the mental health issues have received widespread social attention. Emotions and stress states are important factors which are closely related to this issue. Negative emotions and excessive stress states have negative effects on people's cognition and decision-making. For example, it is easy to affect the working status of specific professional groups such as doctors and drivers, which will lead to hidden dangers of accidents. Persistent negative states may even seriously damage physical health or lead to mental illness. Therefore, automatic monitoring methods and systems for emotions or stress states have important research significance and application value. Recently, with the development of deep learning theory and technology, many research works have applied deep networks to the field of emotion or stress detection. These methods require a large amount of data to be collected in a laboratory environment for network model training. Considering the limitation of the data collection environment on the application scenes, the collection environment of body gestures and peripheral physiological signals is close to natural conditions, and their collection methods are simple and easy to implement. Therefore, the methods of assessing emotions and stress states using body gestures and peripheral physiological signals have received extensive attention. In addition, in the process of data collection, it is difficult to induce different states, and it is difficult to collect sufficient training samples. This may lead to the emergence of various types of missing training samples, such as missing categories, insufficient data for specific groups, and insufficient data for test individuals. This thesis focus on body gestures and peripheral physiological signals. To solve the problem of missing samples of the above data, three emotion or stress detection algorithms are proposed to improve the accuracy and practicability of the algorithm on limited data resources. On the basis of the above algorithms, with the help of the complementary relationship between the two kinds of data, a dual-modal system for the warning of abnormal mental states is further constructed for comprehensive evaluation. The main contributions of this thesis are as follows:
(1) In order to solve the problem that the models trained on small-scale datasets in existing research are difficult to apply to more diverse emotion-related body gestures, a Generalized Zero-Shot Learning (GZSL) method is proposed for body-gesture-based emotion recognition, which can predict unseen categories with the help of designed semantic representations. In order to make full use of body movement information, an emotion category is regarded as a collection of multiple body gestures, and a framework that can satisfy the dual constraints of both gesture labels and emotion labels is proposed accordingly. Specifically, a generalized zero-shot learning network with two branches is proposed. These two branches are a Hierarchical Prototype Network (HPN) and a Semantic Auto-Encoder (SAE), which are used to predict the samples of the seen and unseen classes respectively. The hierarchical prototype network learns the two-level prototypes of body gestures and emotions with the help of prior knowledge of the relationship between emotions and body gestures, so as to enhance the separability of gesture categories and the intra-class similarity of emotion categories. The semantic auto-encoder is used to learn the mapping from the feature space to the semantic space, and predict samples from unseen categories with the help of the designed semantic representations containing both emotion and gesture information. Experimental results on the public MASR dataset demonstrate that the proposed method is superior to the existing GZSL algorithms and the baseline methods using only a kind of labels.
(2) In order to solve the problem of data distribution difference between the target group with only a few labeled samples and the general group with a large amount of training data, an adversarial transfer learning algorithm with domain mixup is proposed for physiological-signal-based stress detection. This model realizes the domain transfer from the general group to the target group by learning domain-invariant features. Specifically, domain-mixup samples at the feature level are designed and constructed to enhance the generalization of the domain discriminator, based on the adversarial transfer network including a feature extractor, a domain discriminator, and a stress detector. The feature extractor and the domain discriminator are optimized by adversarial training, which contributes to learning the features with domain invariance. These three modules are jointly trained to ensure that the features generated by adversarial learning will not negatively affect the performance of the stress detector. On this basis, aiming at the problem of the imbalanced label distribution in the stress detection task, a loss correction method based on the class prior probability is proposed to improve the recognition performance of the high-stress categories which have fewer training samples. Since there is still no public dataset for a specific target group, a physiological signal dataset including the general group and police school students is constructed for experimental evaluation. The experimental results demonstrate that the proposed algorithm is superior to the non-transfer-learning baseline algorithms and representative transfer learning algorithms.
(3) To solve the problem that the training of existing personalized models usually requires a large number of samples of the test subject, an emotion and stress detection method based on a siamese network is proposed. The method learns the differences between pairs of samples by constructing a relative intensity regression model, so that it can calibrate the personalized model using only one labelled baseline sample. On this basis, in order to better use the samples obtained by data segmentation for network training, a new intensity ranking sub-task is constructed to assist the regression task. Ranking rules and the construction method of the sample pairs are further designed according to the characteristics of the physiological data. To enhance the ability of the relative intensity representation of the proposed features, the strength ranking sub-task performs pairwise ranking with the help of the relative strength supervision information of the segment pairs. The above two sub-tasks share the siamese network for feature extraction, and are trained alternatively. The experimental results on the newly collected stress dataset and the public DEAP emotion dataset demonstrate that the proposed method outperforms the baseline methods based on single-sample calibration, and has similar performance to the personalized model trained with multiple samples.
On the basis of the above stress and emotion detection algorithms, a dual-modal system for the warning of abnormal mental states is further constructed for experimental verification. Based on the complementarity of body gestures and physiological signals, the system can realize comprehensive assessment and warning of abnormal states such as negative emotions and excessive stress. In order to evaluate the performance of this system, different psychological responses of subjects are induced by video materials, and a dual-modal database containing body gestures and physiological signals is built. Experimental results on this dataset demonstrate that the proposed system performs significantly better than the algorithms for single-modal data.
|Keyword||情绪识别 心理压力检测 零样本学习 迁移学习 孪生网络|
|IS Representative Paper||是|
|Sub direction classification||人工智能+医疗|
|planning direction of the national heavy laboratory||其他|
|武金婷. 面向样本缺失场景的情绪与压力状态评估方法研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2022.|
|Files in This Item:|
|Recommend this item|
|Export to Endnote|
|Similar articles in Google Scholar|
|Similar articles in Baidu academic|
|Similar articles in Bing Scholar|
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.