面向社交媒体的情绪原因联合抽取方法研究
陈仲豪
2022-05-19
页数80
学位类型硕士
中文摘要

情绪-原因对抽取(emotion-cause pair extraction,ECPE)是情绪分析的一个子任务,近年来受到了广泛关注。情绪-原因对抽取任务旨在从给定的文本中同步抽取出情绪子句及其对应的原因子句,该任务从情绪原因抽取(emotion cause extraction,ECE)拓展而来。情绪分析领域另一个经典的任务是情绪分类,该任务目标是预测文本所蕴含情绪的类别,而ECPE 的研究对象是情绪与触发情绪的原因,相对于情绪类别是更完整的情绪事件。在实际场景中,ECPE 可以帮助对话系统对用户情绪的变化进行检测,辅助情感机器人生成安抚性的回复;能够帮助政府、企业挖掘公众舆论倾向背后的诱因。因此,该任务具有重要的学术研究意义和实际应用价值。
本文聚焦于中文社交媒体领域的情绪原因联合抽取任务。现有工作主要集中于中文新闻领域,忽略了社交媒体领域,主要原因是目前中文社交媒体领域相关数据集较为稀缺。但是,社交媒体领域情绪表达多样,隐喻反讽等现象较多,更具挑战性。同时,社交媒体文本是由真实用户发布的,能够反应社会舆论倾向,更具实用性。此外,现有模型在解决情绪-原因对抽取任务时通常会面临两个问题:(1)在长文本中,情绪-原因对比较稀疏,导致正负例不均衡,制约模型的泛化能力;(2)现有ECPE 方法无法将情绪-原因对全部检测出来,对于漏检的情况可以使用ECE 模型进行人机协同检测。但是,现有方法无法同时处理ECPE 和ECE 任务。在实际场景中,使用两个模型分别处理两个任务,造成了训
练和部署阶段过多的资源消耗。针对上述问题,本文开展了以下两个研究工作:

1. 提出了一种融合情绪类别与意图信息的情绪-原因对抽取方法
针对中文社交媒体领域数据稀缺的问题,本文首先构建了一个包含5009 条样本的面向中文微博的情绪-原因对抽取数据集WeiboEmotion,这是当前该领域规模最大的中文数据集。该数据集中定义了细粒度的情绪类别,同时标注了情绪事件造成的用户意图。结合数据集特点,本文提出了一种融合情绪类别和意图信息的情绪-原因对抽取方法,用于探索细粒度情绪类别和意图信息是否对ECPE 任务有所帮助。实验结果表明,本文所提方法可以有效地提升中文社交媒体领域情绪-原因对的抽取效果。

2. 提出了一种基于生成式预训练语言模型的情绪-原因对抽取方法
针对现有模型存在的两个问题,本文提出了一种基于生成式预训练语言模型的情绪-原因对抽取方法Gen-ECPE。利用生成式预训练语言模型的特点,Gen-ECPE 能够直接预测情绪-原因对,避免候选情绪-原因对的构建,缓解了正负样本比例失衡的问题。同时,利用自回归语言模型从左至右逐字生成的特点,解决
了模型无法同时处理ECPE 和ECE 任务的问题。实验结果表明,本文所提方法达到了与当前两种任务的最优方法可比的效果。

英文摘要

Emotion-Cause Pair Extraction (ECPE) is a sub-task of emotion analysis, which has attracted extensive attention in recent years. ECPE aims to extract emotion clauses and its corresponding cause clauses synchronously from a given text, which is extended from Emotion Cause Extraction (ECE). Another classic sub-task of emotion analysis is emotion classification focusing on predicting the emotion category of the text. How-ever, the goal of ECPE is detecting emotion and the causes of triggering emotion, which is a more complete emotion event than the emotional category. In the real-world sce-narios, ECPE can detect the causes of users’ emotion changes in the dialogue system and help the emotional robot produce a soothing reply; ECPE can help the government and enterprises explore the causes behind the tendency of public opinion. Therefore, this task has important academic research significance and practical application value.
This thesis focuses on the emotion-cause pair extraction in Chinese social media domain. The existing work mainly explore in Chinese news domain, ignoring the social media domain, which is caused by the lack of Chinese social media datasets. However, emotion expression is diverse in the social media domain. And there are many phenom-ena such as metaphor and irony, which are more challenging. Meanwhile, social media texts are published by real users, which reflect the tendency of public opinion. There-fore, ECPE in Chinese social media domain is more practical. In addition, the existing models usually face two problems when solving the ECPE task: (1) In long texts, emo-tion cause pairs are sparse, resulting in the imbalance of positive and negative examples, which restricts the generalization ability of the model; (2) The existing ECPE methods can not detect all the emotion cause pairs. In case of missing detection, ECE model can be used for man-machine collaborative detection. However, existing methods cannot handle ECPE and ECE tasks synchronously. In the real-world scenario, two models are needed to deal with the two tasks respectively, resulting in excessive resource con-sumption in the training and deployment stages. To alleviate the above problems, the two-fold research work in this thesis is as follows:

1. A method fusing emotion category and intention information for Emotion-Cause Pair Extraction
To alleviate the problem of lacking Chinese social media dataset, this thesis con-structs a Chinese Microblog dataset WeiboEmotion containing 5,009 samples, which is the largest Chinese dataset in this domain. Fine-grained emotion categories are defined in this dataset, the user intention caused by emotion events is annotated. Consider-ing the features of this dataset, this thesis proposes a method fusing emotion category and intention information for Emotion-Cause Pair Extraction to explore whether the an-notation of fine-grained emotion categories and intentions is helpful for ECPE tasks. Experimental results show that the proposed method can effectively improve the ECPE result in the social media domain.
2. A method of leveraging the generative pre-training language model for Emotion-Cause Pair Extraction
To alleviate the problems existing in the classification framework, this thesis pro-poses a method (Gen-ECPE) of leveraging the generative pre-training language model for Emotion-Cause Pair Extraction. Gen-ECPE directly predicts the emotion-cause pair with the characteristics of the generative pre-training language model to alleviate the imbalance problem of samples. In addition, Gen-ECPE can tackle both ECPE and ECE tasks in a model leveraging the advantage of word-by-word generation from left to right of the autoregressive language model. Experimental results show that the proposed method achieves a comparable effect with the mainstream methods of the current two tasks.

关键词情绪-原因对抽取 社交媒体 微博数据集 情绪原因抽取
语种中文
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/48511
专题多模态人工智能系统全国重点实验室_自然语言处理
毕业生_硕士学位论文
推荐引用方式
GB/T 7714
陈仲豪. 面向社交媒体的情绪原因联合抽取方法研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2022.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
面向社交媒体的情绪原因联合抽取方法研究.(17236KB)学位论文 开放获取CC BY-NC-SA
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[陈仲豪]的文章
百度学术
百度学术中相似的文章
[陈仲豪]的文章
必应学术
必应学术中相似的文章
[陈仲豪]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。