面向社交媒体的用户交互意图分析

CASIA OpenIR > 毕业生 > 硕士学位论文

	面向社交媒体的用户交互意图分析
	崔宸熙
	2017-05-28
学位类型	工学硕士
中文摘要	近年来，随着社交媒体（微博、Twitter、Facebook等）深入发展和普及，人们越来越依赖于社交媒体分享个人经历、发表观点、表达意愿，并由此产生了海量用户生成内容。其中，交互意图广泛存在于社交媒体的用户讨论中，对社交媒体中用户交互行为的意图进行挖掘和分析可以有效支持舆情监控和辅助决策，在诸多领域具有十分重要的研究意义和应用价值。本论文工作聚焦社交媒体中的用户交互意图挖掘问题，利用智能分析技术手段，研究基于言语行为理论的用户交互意图分类及其识别方法，并以新浪微博数据为例，对所提出的交互意图识别方法进行有效性验证。论文工作包括三个方面： 1.用户交互意图分类体系构建与基于行为标记语词典的意图识别方法。针对现有用户意图类别定义分散、依赖特定领域的问题，参考言语行为分类框架，提出社交媒体中的用户交互意图分类体系；在此基础上，提出了一种基于行为标记语词典、结合外部信息源的用户交互意图识别方法，通过为每个意图类别构建其行为标记语词典，基于词典对用户交互意图进行分类。实验表明，所提出的行为标记语词典构建方法，能够较为准确地从大规模文本中挖掘行为标记语，用于用户交互意图识别。 2.基于词典自动标注语料与基于泛化特征的交互意图识别方法。为解决大规模语料标注的困难性，提出一种基于行为标记语词典自动标注语料的方法；在此基础上，从词项和短语中挖掘相关语法、语义及社交媒体特征，并结合学习算法对用户交互意图进行分类识别。实验表明，所提出的自动标注方法能有效提高大规模语料标注的精度，同时通过对所提炼特征的有效挖掘，能有效提升用户交互意图识别的效果。 3.基于对话序列的分层隐马尔可夫（HiddenMarkovModel，HMM）交互意图识别模型。考虑到上下文信息对用户交互意图的影响，并结合句子和篇章不同层次的处理，提出一种基于分层HMM的交互意图识别模型，在句子级交互意图识别的基础上，建立对完整微博（包括原创帖子、转发和回复）表达中的主旨交互意图进行分析识别。实验表明，所提出的分层HMM交互意图识别模型，能有效结合不同层次的信息，在句子级和微博级，提升用户交互意图识别的效果。
英文摘要	With the continuous development and expansion of the social media, there is a growing tendency to share experiences, exchange opinions, and express whishes on social media platform (micro-blog, Twitter, Facebook etc.), causing massive user generated content. These valuable contents, especially users’ online interactions on social affairs and public events, reveal a variety of communicative purposes that implicitly express user intentions. Recognizing intents in users’ online interactive behavior from social media data can effectively identify users’ motives and intents behind communication and provide vital information to aid monitoring, analysis and decision-making in many fields. Focusing on the intention mining problem of online interactive contents in social media, this thesis aims to build a general classification scheme based on the speech act theory and develop various approaches to classify users' utterances towards hot events into different intent categories based on the techniques of social media analytics and intelligence technologies. We evaluate the effectiveness of the proposed approaches using the data from popular social media platform, Sina Weibo. The contents of this thesis include three aspects: 1. Existing definitions of users' online intents are depend heavily on corpus and domain. To deal with the problem, we build a classification scheme of user intents in online interactions based on the speech act theory, which classifies users' utterances into different intent categories according to their pragmatic functions. On the basis of this, we propose a dictionary-based classification approach to automatically construct performative dictionary using external information sources. Provided the performative dictionary, we can recognize user intents with dictionary. Experimental study using a microblog dataset on public safety events from SinaWeibo shows that our approach can construct a high-quality performative dictionary, which provides effective knowledge for user intent classification and recognition. 2. To deal with the problem of massive data annotation, we propose an automatic method to label the user intent corpus. In feature-based classification approach, we first analyze the syntax of utterance for contextual compensation, pragmatic enrichment, and redundant filtering. Then, we characterize the semantic, syntactic and platform features considering temporal, subjective and pragmatic factors. Finally, we train feature based classifiers to identify user intents in their online interactions. Experimental study using an auto-labelled microblog dataset on public safety events from SinaWeibo shows that our approach can ease the dependency on corpus and topic, which improves the classification accuracy and effectively identifies the user intents in social media. 3. To incorporate the context information of dialogue sequence and consider both sentence and paragraph level treatment, we propose an intention recognition approach based on hierarchical model of dialogue sequence. In this approach, we handle online text as a sequence of context utterance and identify multiple intent categories of each online text by sequence learning of HMM model. At the same time, we construct dialogue sequences according to the retweet and reply relations. With a second layer HMM model, we identify intents of dialog sequences and recognize the main intent categories of each online text. Experimental study using a microblog dataset on public safety events from SinaWeibo shows that our approach can effectively identifies user intents in online interactions.
关键词	用户交互意图言语行为理论交互意图分析识别
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/14840
专题	毕业生_硕士学位论文
作者单位	中国科学院自动化研究所
第一作者单位	中国科学院自动化研究所
推荐引用方式 GB/T 7714	崔宸熙. 面向社交媒体的用户交互意图分析[D]. 北京. 中国科学院研究生院,2017.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
硕士毕业论文-ccx.pdf（1837KB）	学位论文		限制开放	CC BY-NC-SA