知识驱动的社会媒体虚假信息分析研究
张怀文
2021-05-29
页数144
学位类型博士
中文摘要

网络社会媒体平台(online social media platforms)已经成为人们发布、传播和消费信息的最重要媒介之一。与传统媒体(广播、电视、报纸等)由权威机构 发布信息不同,在社会媒体平台上,每天都有数以亿计的用户自发地发布和分 享最新消息。然而,很少有用户会仔细检查他们所分享的信息的真实性,这意味 着大量的虚假信息可能会在社会媒体平台出现和传播。这些社会媒体虚假信息 (social media misinformation)意图以不实消息误导大众,从而获取政治、经济、 心理等方面的利益,并已发展成为对政治工作、公众信任、媒体权威和市场经济 的巨大威胁。开展社会媒体虚假信息分析研究,在净化网络空间环境、维护社会 和谐稳定、保障国家信息安全等方面意义重大。

社会媒体上的帖子(微博,推文等)具有文本短,模态多,噪声大等特点。 因此社会媒体虚假信息分析技术,一般在建模社会媒体帖子内容的同时引入用 户属性、传播结构等多种外部辅助信息,期望从中挖掘出有效的特征以准确快速 地对虚假信息进行定位。在众多的辅助信息中,有一类较为特殊,即人们日常生 活中沉淀的知识。例如,文字实体延伸的概念知识(concept)、写作文体所引申 风格知识(style)、字里行间表达的立场知识(stance)等。这些知识是人类判别 消息可信度的重要依据,它可以为社会媒体虚假信息分析算法提供丰富的辅助 输入。本研究致力于将知识引入到社会媒体虚假信息分析方法中,对知识驱动的 社会媒体虚假信息分析进行研究,利用人类的高维结构知识,提高对虚假信息的 分析能力。本文的研究内容和主要贡献如下:

一、研究应用于社会媒体数据的知识结构构建方法。知识驱动的社会媒体虚 假信息分析的首要工作就是从社会媒体多模态数据中提取知识和知识结构。社 会媒体平台上的帖子,一般文本长度较短并附有图片等多模态内容。为了从多模 态的社会媒体语料中提取知识信息和知识结构,本文提出了变分深度图嵌入聚 类方法,通过变分深度图嵌入方法刻画语料中的知识概念,并通过层次聚类方法 归纳多模态知识结构。能够同时利用语料中的文本和视觉对应关系、上下文共现 关系等因素,无监督地发现概念间的语义层次结构。模型可以从社会媒体语料中 抽取知识信息和知识结构,构建多模态社会媒体信息知识图谱,为社会媒体虚假信息分析方法提供知识补充。

二、研究概念知识感知的社会媒体虚假信息内容分析方法。现有的虚假信息 检测方法倾向于从简短的消息文本中寻找线索,很大程度上忽略了高度浓缩的 文字背后蕴含的丰富概念知识,而正是这些知识信息能够帮助人类验证虚假信 息。本文提出了一种多模态知识感知事件记忆网络,能够从外部知识图谱中拓展 社会媒体消息中隐含的背景知识,并通过同时建模文本、知识、图像以及事件特 征,有效提升社会媒体虚假信息检测能力。其中多模态知识感知网络利用多模态 社会媒体信息知识图谱,检索文本背后的丰富概念知识,并通过融合文本、视觉 和知识特征,获取社会媒体消息的有效表示;事件记忆网络提取社会媒体消息中 蕴含的事件不变特征,进一步提高模型的鲁棒性。

三、研究风格知识解纠缠的社会媒体虚假信息文体分析方法。真实情况下的 虚假信息分析算法总是面对新出现的、紧急的、没有标注数据的事件。而虚假信 息往往与事件内容纠缠在一起。属于同一事件的两个虚假信息可能在文本和图 像上有明显差异,属于不同事件的虚假信息其表述差异更大。本文提出了一种多 模态解纠缠领域自适应方法,将多媒体帖子的特征空间解纠缠为事件内容空间 和文体风格空间,并利用领域风格自适应算法将不同事件的文体风格知识进行 对齐。删去了随事件变化的内容特征,专注于刻画可迁移的文体风格知识,算法 可以训练得到鲁棒的社会媒体虚假信息检测器,它可以将从源事件中学习到的 知识转移到目标事件中,并在检测新出现的事件时表现优异。

四、研究立场知识辅助的社会媒体虚假信息受众分析方法。随着虚假信息的 传播,对于该信息的支持、怀疑以及反对的回复(受众立场)会不断涌现。这些 受众立场知识,可以作为判别信息可信度的重要指标。将立场检测任务中包含的 有效知识特征引入到虚假信息分析任务中,可以有效提高虚假信息检测模型性 能。本文提出了一种多模态元多任务学习方法,通过共享高层元知识网络,来刻 画隐藏在两个任务背后的共享元知识,并以元知识为基础预测各个任务模型的 参数。模型包含的注意力机制能精准吸收隐藏在细粒度立场标签中的语义知识, 进一步提高模型的虚假信息检测能力。

英文摘要

Online social media platforms have become one of the most important platforms for people to post, spread and consume information. Unlike traditional media (radio, television, newspapers, etc.), where authoritative organizations release information, there are hundreds of millions of users who spontaneously release and share the latest news every day on social media platforms. However, few users carefully check the authenticity of the information they share, which means that much misinformation may be released and spread on social media platforms. Social media misinformation intends to mislead the public with untrue information to gain political, economic, psychological, and other benefits. It has become a huge threat to political work, public trust, media authority, and the market economy. The analysis and research of misinformation in social media have great significance in purifying cyberspace’s environment, maintaining social harmony and stability, and safeguarding national information security.

Posts on social media (micro-blogs, tweets, etc.) are characterized by short text, multi-modal, and high noise. Therefore, the social media misinformation analysis technology generally jointly models the message content, user attributes, communication structure, and other knowledge information on social media and aims to mine effective features to accurately and quickly locate the misinformation. There is a special kind of information accumulated in people’s daily life among the numerous auxiliary information, i.e., knowledge. For example, the concept knowledge, which is the extension of the literal entity, the style knowledge, which is reflected in the writing form, and stance knowledge, which is expressed between the lines. Knowledge is an important basis for humans to judge posts’ credibility, and it can provide abundant auxiliary input for social media misinformation analysis algorithm. This thesis introduce the knowledge into the social media misinformation analysis, conduct research on the knowledge-driven social media misinformation analysis, and use the high-dimensional structural knowledge of human beings to improve the ability of misinformation analysis. The contributions of this thesis are summarized as follows:

1. The construction method of knowledge structure applied to social media data. The primary task of knowledge-driven social media misinformation analysis is to extract knowledge and knowledge structure from multi-modal data of social media. Posts on social media platforms are usually short in length and attached with multi-modal content such as pictures. A variational deep graph embedding and clustering method is proposed in this paper to extract knowledge information and structure from multi-modal social media corpora. The knowledge concepts of corpora are captured by the variational deep graph embedding method, and the multi-modal knowledge structure is summarized by the hierarchical Gaussian mixture model clustering method. It can simultaneously use the correspondence between textual and visual, contextual co-occurrence, and other corpus factors to discover the semantic hierarchy between concepts without supervision. Finally, the learned model can extract knowledge and knowledge structure from social media corpus, construct a multi-modal social media knowledge graph, and provide the knowledge for the social media misinformation analysis method.

2. The conceptual knowledge aware social media misinformation content analysis method. Existing misinformation detection methods tend to infer clues in the short post text, largely ignoring the rich conceptual knowledge behind the highly condensed text, which helps humans verify misinformation. This thesis proposes a multi-modal knowledge-aware event memory network, which can expand the background knowledge information hidden in social media posts from the external knowledge graphs and effectively improve misinformation detection by simultaneously modeling text, knowledge, image, and event features. The multi-modal knowledge-aware network uses the multi-modal social media knowledge graph to retrieve the rich conceptual knowledge hidden in the text. It obtains the effective representation of social media posts by integrating text, visual and knowledge features. The event memory network extracts the event invariant features in social media posts to further improve the model’s robustness.

3. The style knowledge disentangled social media misinformation genre analysis method. Real-world misinformation analysis algorithms are always facing the newly emerged and time-critical events which have no labeled data. However, the misinformation is often entangled with the content of the event. The two pieces of misinformation belonging to the same event may have obvious differences in text and image. The misinformation belonging to different events has a greater difference in expression. This thesis proposes a multi-modal disentangled domain adaption method, which disentangles the feature space of multimedia post into event content space and genre style space and uses the domain style adaptive algorithm to align the style distribution of different events. By deleting the content features that change with the event and focusing on learning transferable genre style features, the algorithm can train a robust social media misinformation detector, transferring the knowledge learned from the source event to the target event and perform well in detecting new events.

4. The stance knowledge assisted social media misinformation audience analysis methods. With the spread of misinformation, replies with stances such as supports, doubts, and queries will incessantly emerge. The user stance information can be used as important clues to judge the credibility of posts. The misinformation detection model can achieve better performance by introducing the stance detection task’s valid features into the misinformation analysis task through multi-task learning. This thesis proposes a multi-modal meta multi-tasking learning method, which captures the shared metaknowledge behind two tasks by sharing the high-level meta-knowledge network layer and predicting the model parameters of each task based on the shared meta-knowledge. The attention mechanism of the proposed model can effectively absorb the semantic knowledge hidden in the fine-grained position label and further improve the performance of misinformation detection.

关键词社会媒体虚假信息 社会媒体数据挖掘 知识驱动 多模态
语种中文
七大方向——子方向分类多模态智能
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/44801
专题多模态人工智能系统全国重点实验室_多媒体计算
推荐引用方式
GB/T 7714
张怀文. 知识驱动的社会媒体虚假信息分析研究[D]. 中国科学院自动化研究所. 中国科学院大学,2021.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
张怀文-Thesis.pdf(37471KB)学位论文 开放获取CC BY-NC-SA
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[张怀文]的文章
百度学术
百度学术中相似的文章
[张怀文]的文章
必应学术
必应学术中相似的文章
[张怀文]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。