面向社交媒体的事件检测与流行度预测方法研究

CASIA OpenIR > 多模态人工智能系统全国重点实验室 > 互联网大数据与信息安全

	面向社交媒体的事件检测与流行度预测方法研究
	陈观淡
	2019-05-25
页数	114
学位类型	博士
中文摘要	随着互联网的发展和普及，社交媒体已经成为了人们发表个人观点、分享信息、表达情感、评论时事的重要平台。社交媒体传播速度快、覆盖范围广和易于获取等特点，使其成为人们获取信息的一个重要来源。面向社交媒体的事件检测可以有效定位用户关注的主题事件，从海量网络数据中自动筛选事件信息。用户的交互行为通常使得某些事件受到更多的关注，即拥有更高的流行度。面向社交媒体的流行度预测则可以进一步分析事件的传播发展趋势，即利用事件传播的早期信息预测未来一段时间内讨论该事件的用户生成内容的数量。面向社交媒体的事件检测及其流行度预测是社会媒体分析与智能领域极具重要性的研究课题，同时有助于政府部门、企业商家及个人获取重要的舆情动向和决策依据，在国家与社会公共安全、商业领域等具有重要的研究意义和应用价值。基于表示学习的方法采用数据驱动的方式学习数据的高层表示，避免了繁琐的特征工程。近年来，基于表示学习的方法在图像识别、语音识别等多个领域取得了巨大的成功。本论文聚焦社交媒体中事件检测与流行度预测问题，研究建立基于表示学习的事件检测与流行度预测方法，并利用Twitter社交媒体平台数据对所提出的事件检测与流行度预测方法进行了实验验证。本论文的主要贡献包括： 1. 以往主要的事件检测相关研究中或者采用启发式的相似度量函数，或者依赖于词袋模型假设，难以得到较优的模型。基于表示学习的方法虽然更容易进行优化，但可解释性较弱。本论文针对社交媒体数据，提出一种结合隐空间向量表示与关键词表示的事件检测模型。该模型结合了隐空间向量表示易于优化以及关键词表示可解释性较好的特点，同时学习事件向量表示、微博与事件的相似度量函数、及事件关键词表示。 2. 以往子事件检测相关研究中大多忽略了背景事件信息。另外，由于数据集标注费时费力，目前还没有一个较大的子事件检测公开数据集。因此，本论文提出一种基于非监督深度学习的子事件检测模型。该模型首先通过最大化文本生成概率进行子事件检测；同时考虑到数据稀缺问题，利用大规模外部数据预训练模型参数并迁移到子事件检测模型中。 3. 事件流行度往往受用户网络、文本内容、时间等多个因素的影响，并且这些因素之间存在复杂的交互关系。以往主要的相关研究中或者只关注其中部分信息，或者需要繁琐的特征工程。因此，本论文提出一种基于信息融合的流行度预测模型。针对社交媒体中丰富的文本、用户以及时间序列信息，分别建立编码器学习其隐式表示，并通过信息融合进行流行度预测。 4. 以往流行度预测相关研究中往往忽略了子事件及其他关联事件对于流行度的影响。因此，本论文提出一种基于事件相关关系的流行度预测模型。该模型利用事件的文本和用户信息挖掘事件与子事件及事件之间的关联关系，进而建立子事件编码器和关联事件编码器来分别学习子事件及其他关联事件表示，用于流行度预测。
英文摘要	With the development and pervasiveness of the Internet, social media has become an important platform for people to express their personal views, share information, express emotions and comment on recent events. Social media has some unique characteristics, that is, fast information dissemination, wide coverage and easy to access, which makes it an important source for people to get information. Social media oriented event detection can discover event information automatically from massive web data, and locate theme events that users concern in. Users’ interactive behavior makes certain events receive more attention, and these events own higher popularity. Popularity prediction can further analyze the propagation trends, i.e. using the early information of propagation to predict the amount of user generated contents that discuss the event in a future time period. Social media oriented event detection and popularity prediction are vital research topics of Social Media Analysis and Intelligence. In addition, it helps the government, enterprises and individuals to acquire important public opinion trends and decision-making evidence. It has high research and application values in the field of national, social and public security, business and so on. Representation learning is a data-driven approach to learn the high-level representation of data, which can avoid tedious feature engineering. In recent years, representation learning based methods have achieved great success in many fields, such as image recognition and speech recognition. This thesis focuses on the problems of event detection and popularity prediction in social media, and proposing event detection and popularity prediction methods based on representation learning. We use datasets collected from Twitter social media platform to conduct experiments and verify the effectiveness of our proposed event detection and popularity prediction methods. The main contributions of this paper are as follows. (1) The majority of previous research on event detection either adopt a heuristic similarity metric functions, or relies on the hypothesis of bag of words assumption, and thus it is hard to get a relatively good model. Representation learning is easy for optimizing but lacks of interpretability. Thus, we propose an event detection model combining vector representation in hidden space and keyword representation for social media data. The model combines the advantage of the ease of optimization of vector representation in hidden space and the interpretability of keyword representation. The model jointly learns event representation, the similarity metric function for tweet and event, and event keyword representation. (2) The majority of previous research on sub-event detection neglects information of background events. In addition, constructing labeled dataset for sub-event detection is very time-consuming and laborious, and there is no large public available dataset for sub-event detection. Therefore, we propose a sub-event detection model based on unsupervised deep learning. We detect sub-events by maximizing the probability of text generation. In addition, considering the problem of data scarcity, we train model parameters using a large-scale external dataset, and transfer parameter values to our sub-event detection model. (3) Event popularity is often affected by many factors, such as user network, text content, time and so on. There are also complex interactions among these factors. The majority of previous research either focuses on part of these information, or requires tedious feature engineering. Thus, we propose a popularity prediction method based on information fusion. Three encoders are proposed to learn the representations of text, user and time series in hidden spaces respectively, and then event popularity is predicted through information fusion. (4) Previous research usually neglects the impact of sub-events and other related events on event popularity. To address this problem, we propose a popularity prediction model based on the interrelations of events. Our model mines sub-events and other relations between events using user and text information. It then constructs sub-event encoder and related event encoder to learn sub-event representation and related event representation for popularity prediction.
关键词	社交媒体分析事件检测流行度预测事件相关关系表示学习
学科门类	工学::计算机科学与技术（可授工学、理学学位）
语种	中文
七大方向——子方向分类	社会计算
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/23799
专题	多模态人工智能系统全国重点实验室_互联网大数据与信息安全
推荐引用方式 GB/T 7714	陈观淡. 面向社交媒体的事件检测与流行度预测方法研究[D]. 中国科学院自动化研究所. 中国科学院大学,2019.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
Thesis-cgd-V5.pdf（3562KB）	学位论文		开放获取	CC BY-NC-SA