融合用户信息的文本情感分析方法研究 | |
李俊杰 | |
2018-12-04 | |
页数 | 100 |
学位类型 | 博士 |
中文摘要 | 随着互联网技术的快速发展,越来越多的用户在网上发表关于产品、社会事件以及政府政策等的评论。对这些主观信息进行自动收集和情感分析,可以挖掘用户对产品或事件的倾向,有利于相关企业或部门及时获取产品或政策的反馈,因此情感分析研究具有重大的现实意义。
不同用户在情感表达的用词、对产品不同方面的关注、以及对产品打分的特点上都有着不同的偏好,这些偏好对文档级别情感分类非常重要。 2、提出了三种融合用户属性的文档级别情感分类策略 用户评论不仅在个体用户偏好上体现出差异性,同时,在相同属性(年龄和性别等)的用户群体中,也呈现出一定的规律性。 3、提出了一种融合多类信息的要素级别情感分类方法 要素级别情感分类的目的是预测评论文本中各个要素的情感标签,其中要素指的是产品的某些待评价的方面。 4、提出了面向个性化情感摘要的用户敏感序列网络模型 现有的情感摘要方法忽视了对用户本身的建模,这些方法往往不能针对不同用户生成不同的摘要。事实上,对于同一个产品,不同的用户会关注不同的方面,因此,针对不同用户的摘要应该有所差异。本文针对情感摘要的个性化问题,在传统的序列到序列模型的基础上提出了一个用户敏感的序列网络模型。该模型在生成摘要时可以融合用户对评论内容关注的差异以及用户特有的用词习惯。实验表明,本文提出的方法显著优于传统的序列到序列模型,并且该模型可针对不同用户生成个性化的情感摘要。 综上所述,本文在针对如何利用用户信息改善已有的情感分析方法上进行了深入的研究,分别研究了用户ID和用户属性对情感分类和情感摘要的影响,并提出了一系列的模型来融合这两类信息,最终有效地提高了情感分类和情感摘要任务的性能,相关成果有力地推动了该领域的研究。 |
英文摘要 | With the development of Internet, more and more people write reviews about 1. Incorporating multi-level user preference into document-level sentiment classification Different users have different word-using habits to express opinion, care about different aspects of a product, and have different characteristics to score reviews. These user preferences may be helpful document-level sentiment classification. This thesis proposed a Hierarchical User Attention Network (HUAN) to model these three kinds of use preference jointly. Specifically, HUAN encodes different kinds of information (word, sentence, aspect and document) in a hierarchical structure and imports user embedding and user attention mechanism to model these preferences. We conduct experiments on two real dataset. Experiments show that these three kinds of user preference can boost the performance of sentiment classification. Compared with models that not consider user information, our method can improve 3%. Furthermore, HUAN can also mine important attributes of products for different users. 2. Proposing three strategies to merge user attributes into document-level sentiment classification Except for the effect of user ID on sentiment classification, we find user attributes can also improve sentiment classification performance. People in different groups may have different preferences on products. For example, a young man loves iPhones, however old man may prefer phones which are easy to use. We propose three strategies to consider user attributes: (1) treat them as features; (2) design a graph-based method to model the relationship between tweets posted by users with similar attributes; (3) combine aforementioned two strategies. Experiments show that our three strategies can obtain 1.9, 0.9 and 2.2 percent improvements. 3. Incorporating multi-level information into document-level multi-aspect sentiment classification Document-level multi-aspect sentiment classification aims to predict user’s sentiment polarities for different aspects of a product in a review. Existing approaches mainly focus on text information. However, the authors (i.e. users) and overall ratings of reviews are ignored. We propose a model called Hierarchical User Aspect Rating Network to consider user preference and overall ratings jointly, and adopt a multi-task framework to reinforce it. Empirical results show that compared with baseline and the state-of-the-art method, our method can obtain 6.0 and 1.7 percent improvements. 4. Proposing user-aware sequence network to perform personalized review summarization Existing sentiment summarization methods ignore users and generate a summary for all users. However different users care about different aspects of a product. Therefore we first propose the personalized issue of sentiment summarization. Then we propose a user-aware sequence network to perform the task, which incorporates aspect-level user preference and user-specific word-using habits. To validate our model, we collect a new dataset comprising reviews, summaries and users. Empirical results show that our model is significantly better than the basic sequence-to-sequence model. Furthermore, our method can generate different summaries for different users. In summary, this thesis focuses on incorporating user information into sentiment analysis. We study the effects of user ID and user attributes on sentiment classification and summarization, and propose a series of models to consider them, and boost the performance of these two tasks. These results greatly promote researches in this area. |
关键词 | 自然语言理解 情感分析 用户信息 情感分类 情感摘要 |
语种 | 中文 |
文献类型 | 学位论文 |
条目标识符 | http://ir.ia.ac.cn/handle/173211/23063 |
专题 | 毕业生_博士学位论文 |
推荐引用方式 GB/T 7714 | 李俊杰. 融合用户信息的文本情感分析方法研究[D]. 北京市海淀区中关村东路95号中国科学院自动化研究所. 中国科学院大学,2018. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | ||
最终版_李俊杰_2019-01-28 带(2698KB) | 学位论文 | 限制开放 | CC BY-NC-SA |
个性服务 |
推荐该条目 |
保存到收藏夹 |
查看访问统计 |
导出为Endnote文件 |
谷歌学术 |
谷歌学术中相似的文章 |
[李俊杰]的文章 |
百度学术 |
百度学术中相似的文章 |
[李俊杰]的文章 |
必应学术 |
必应学术中相似的文章 |
[李俊杰]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论