CASIA OpenIR  > 毕业生  > 博士学位论文
面向社交媒体的个性化情感分析与立场挖掘方法研究
林俊杰1,2
学位类型工学博士
导师毛文吉
2018-05-27
学位授予单位中国科学院大学
学位授予地点北京
关键词社交媒体分析 情感分析 立场挖掘 个性化信息 基于话题的建模与求精
摘要随着互联网技术的深入发展与普及,社交媒体已经渗透到社会生活的方方面面,成为人们传播信息、分享情感和表达意愿最主要的渠道之一。互联网用户在社交媒体平台上通过发布内容、参与评论等形式表达对特定对象、事件或话题的情感与立场。面向社交媒体文本的情感分析与立场挖掘可以帮助人们发掘公众观点、及时了解和掌握舆情动态,在商业、安全等领域中具有十分重要的研究和应用价值。本论文工作聚焦社交媒体文本的情感分析与立场挖掘问题,探究用户个性化信息在情感分析中的应用,并研究话题信息对多方立场挖掘的作用。采用社交媒体数据集,包括新浪微博数据集、Twitter数据集等,对提出的情感分析与立场挖掘方法进行了有效性验证。
本论文的主要工作和创新点归纳如下:
1. 探究性格因素在社交媒体情感分析中的作用,并首次提出一种基于Big Five性格模型的情感极性分类方法。该方法根据用户的不同性格维度对社交媒体文本进行分组,从而挖掘不同性格维度对应的个性化情感特征,并通过集成学习融合个性化与通用情感分类结果,以提升现有情感分类方法的效果。最后,采用实验方法验证了所提出的个性化社交媒体情感分析方法的有效性。
2. 在立场挖掘领域,首次开展面向多方实体的立场挖掘研究,并提出一种融合双层话题信息的多方立场挖掘方法。该方法利用社交媒体文本中与特定立场相关的话题信息细颗粒度地刻画不同立场的词汇特征,并挖掘与立场无关的通用话题进一步提升立场分类效果。最后,采用实验方法验证了所提出的多方立场挖掘方法的有效性。
3. 为减少多方立场挖掘所需的人工标注数据,同时保证分类性能,提出一种基于用户立场一致性与话题信息的半监督多方立场挖掘方法。该方法采用自训练方式、利用少量已标注文本和大量未标注文本迭代训练立场分类模型,并根据用户立场一致性与话题信息选择高置信度分类样本用于扩充训练文本集合。最后,采用实验方法验证了所提出的半监督多方立场挖掘方法的有效性。
4. 在所提出的半监督多方立场挖掘方法基础上,进一步提出一种基于话题建模的弱监督多方立场挖掘方法。该方法首先运用情感分析自动标注少量文本的立场,再利用大规模文本之间的内在语义关联提升对噪声标签的鲁棒性。该方法通过扩展话题模型得到具有立场区分性的话题,并基于话题相似度确定文本立场。最后,采用实验方法验证了所提出的弱监督方法在多方立场挖掘中的有效性。
其他摘要With the in-depth development and popularization of the Internet, social media have infiltrated into all aspects of social life, and become one of the major channels for people to disseminate information, share emotions and express desires. Internet users express their sentiments and standpoints towards certain entities, events or topics by publishing contents and comments on social media platforms. Sentiment analysis and standpoint mining for social media texts can help people explore public opinions, and understand and grasp the dynamic of popular feelings in time. Thus they are of great research and application value in many areas, such as business and security. In this thesis, we focus on sentiment analysis and standpoint mining of social media texts. Specifically, we investigate the application of users’ personalized information for sentiment analysis, and explore the effect of topic information in multiple standpoint mining. We carry out experimental studies to evaluate the effectiveness of the proposed sentiment analysis and standpoint mining methods on social media datasets, including Sina Weibo dataset and Twitter dataset, etc.
The major works and contribuions of this thesis are summarized as follows:
1) We investigate the role of users’ personality in sentiment analysis for social media texts, and first propose a sentiment polarity classification method based on Big Five personality model. This method groups social media texts according to users’ different personality dimensions, and mines the corresponding personalized sentiment features. In addition, this method employs ensemble learning to merge the results of personalized and general sentiment classification to improve the performances of current sentiment classification methods. We finally conduct experimental studies to verify the effectiveness of the proposed personalized sentiment classification method for social media texts.
2) In the area of standpoint mining, we are among the first to carry out the research on standpoint mining concerning multiple entities, and propose a method for multiple standpoint mining which incorporates double layers of topic information. This method leverages standpoint-related topic information in social media texts to capture the lexical features of different standpoints in a fine-grained way, and mines standpoint-independent general topics to further improve standpoint classification performance. We finally conduct experimental studies to verify the effectiveness of the proposed method for multiple standpoint mining.
3) To reduce the demand of manually annotated data in multiple standpoint mining, and meanwhile guarantee the classification performance, we propose a semi-supervised method for multiple standpoint mining based on user-level standpoint consistency and topic information. This method leverages a small number of labeled texts and large numbers of unlabeled texts to train standpoint classification model iteratively in a self-training way. To expand the set of training texts, this method selects the classification samples with high confidence according to user-level standpoint consistency and topic information. We finally conduct experimental studies to verify the effectiveness of the proposed semi-supervised method for multiple standpoint mining.
4) On the basis of the proposed semi-supervised method for multiple standpoint mining, we further propose a weakly-supervised multiple standpoint mining method based on topic modeling. This method first employs sentiment analysis to annotate the standpoints of a small number of texts automatically, and then leverages the intrinsic semantic relevence of massive texts to improve its robustness to noisy labels. This method extends topic model to acquire the topics which are distinguishable between different standpoints, and determines the standpoints of texts based on topic similarity. We finally conduct experimental studies to verify the effectiveness of the proposed weakly-supervised method for multiple standpoint mining.
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/21063
专题毕业生_博士学位论文
作者单位1.中国科学院自动化研究所
2.中国科学院大学
推荐引用方式
GB/T 7714
林俊杰. 面向社交媒体的个性化情感分析与立场挖掘方法研究[D]. 北京. 中国科学院大学,2018.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
面向社交媒体的个性化情感分析与立场挖掘方(6281KB)学位论文 暂不开放CC BY-NC-SA请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[林俊杰]的文章
百度学术
百度学术中相似的文章
[林俊杰]的文章
必应学术
必应学术中相似的文章
[林俊杰]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。