CASIA OpenIR  > 毕业生  > 博士学位论文
基于知识图谱的事实核查增强方法研究
王帅
2022-05-22
Pages120
Subtype博士
Abstract

互联网上的信息传播具有速度快且成本低等特点,这些特点在给人们带来交流的便利性的同时,也造成了虚假信息的大量传播。为了有效地检测虚假信息,事实核查任务旨在利用外部数据源所提供的证据来判断待查事实的真假性。知识图谱作为含有大量的高质量无歧义的事实性知识的结构化知识库,成为事实核查最重要的信息来源。本论文针对基于知识图谱的事实核查展开研究。该课题不但是网络媒体分析、知识推理和文本挖掘等领域的一项重要研究内容,并且在国家与社会公共安全和商业等领域具有较好的应用价值。现有的基于知识图谱的事实核查方法大多在表示学习和事实查验的过程中忽略了知识图谱本身的稀疏性带来的过拟合问题,并且缺乏对领域知识结构等实体间丰富的语义信息的有效利用,因而影响了事实核查方法的性能。此外,现有的事实核查方法只能处理单个断言,而在现实世界中的复杂情形下,待查事实常以多个断言的形式出现。

 

本论文研究知识驱动与知识增强的表示学习和查验方法,利用知识图谱所提供的外部知识建立结构表示、知识增强与推理计算方法,用于进行单断言和复杂情形下的多断言事实核查。一方面,通过挖掘断言的特征并借助知识的内在结构来设计算法,更好地增强表示学习用于查验单断言的真假性。另一方面,针对复杂情形下的事实查验,学习其整体语义表示并利用语义融合和语义交互增强来更好地学习多断言组合语义,用于多断言事实核查。

 

本论文的主要贡献与创新点归纳如下:

 

1.以往研究中忽略了对实体间类别层次信息的有效利用,且由于知识图谱本身的稀疏性带来实体表示学习的过拟合问题。针对以往工作存在的问题,本论文提出一种基于层次原型学习的端到端单断言事实核查方法。该方法将原型学习引入到事实核查任务中,利用实体的类别层次结构增强实体的表示学习,通过为每个类别学得原型表示,并基于原型表示来优化实体的表示学习过程,使得同一类别的实体表示相互靠近,同时不同类别的实体表示彼此远离。实验结果验证了所提出方法的有效性。

 

2.以往研究在实体的表示学习和事实查验过程中忽略了对知识结构信息的有效利用,因而影响了事实核查方法的性能。针对以往工作存在的问题,本论文提出一种知识结构驱动的单断言事实核查增强方法。该方法利用领域知识结构信息对实体的表示学习和事实查验进行增强,在原型学习的基础上,通过图神经网络聚合实体的属性节点来丰富实体的语义表示,并在以往单一的语义匹配模式基础上设计知识结构驱动的综合查验方法来增强事实查验的效果。实验结果验证了所提出方法的有效性。

 

3.以往基于知识图谱的事实核查工作都是针对单断言情形,而在现实世界的复杂情形下待查事实常表示为多个断言的形式,而已有的单断言事实核查方法应用于多断言事实核查任务时往往会导致错误的结果。为此,本论文基于语义组合,首次建立了多断言事实核查方法。该方法通过组合多个断言的语义为其学得整体的语义表示用于事实核查,语义组合过程通过扩大图卷积网络的感受野来更好地学习多断言的全局语义表示,同时结合显著性单断言的局部语义表示。实验结果验证了所提出方法的有效性。

 

4.为了在语义组合的基础上,更好地为多断言学得整体的语义表示,本论文提出一种基于话题信息、细粒度建模多断言之间语义交互的事实核查方法。该方法利用待查事实的话题信息来对多断言的组合语义表示进行增强,以强化学习作为模型框架,从多断言文本中抽取话题信息和文本结构信息来引导多智能体在实体层面聚合语义连贯的断言簇,并利用层次注意力机制融合每个智能体学得的交互语义表示。实验结果验证了所提出方法的有效性。

Other Abstract

Information dissemination on the Internet has the characteristics of high speed and low cost. Although these characteristics bring convenience to people's communication, they also cause the massive dissemination of false information. In order to detect false information effectively, fact checking aims at retrieving evidence from external sources to verify the correctness of the unverified information. As a structured knowledge base containing a large amount of high-quality unambiguous facts, knowledge graph has become the most important external source for fact checking. This dissertation focuses on the study of knowledge graph based fact checking. It is not only an important research topic in Web media analytics, knowledge reasoning and text mining, but also has good application value in domains, such as national security, public security, and business. Most of the existing methods for knowledge graph based fact checking ignore the overfitting problem caused by the sparsity of the knowledge graph in representation learning and fact checking, and they cannot fully utilize the rich semantic information between entities such as domain knowledge structures. Consequently, the performance of fact checking methods are affected. Furthermore, existing fact checking methods can only handle single-claim statements, whereas in complex real-world situations, the unverified statements are often composed of multiple claims.

  

    This dissertation studies on knowledge driven and knowledge enhanced methods for representation learning and verification, and utilizes external knowledge provided by knowledge graph to establish structural representation, knowledge enhancement and inference methods for single-claim fact checking as well as multi-claim fact checking in complex situations. On one hand, by mining the features of claims and using intrinsic structure of knowledge, this dissertation designs algorithms to enhance learning representations for the verification of single-claim statement. On the other hand, for multi-claim fact checking in complex situations, this dissertation designs algorithms to learn the overall semantic representation of multiple claims and enhance the representations by semantic fusion and semantic interaction.

 

    The main research contributions of this dissertation are summarized as follows:

  

    1. Previous studies largely ignore the use of category hierarchy of entities. The sparsity of knowledge graph will lead to the over-fitting problem of entity representation learning. To alleviate the problems in the previous work, this dissertation proposes an end-to-end single-claim fact checking method based on hierarchical prototype learning. This method introduces prototype learning into fact checking task. It enhances the representation learning of entities by using the category hierarchy of entities. It first learns prototype representations for each category, and then optimizes the entity representations based on prototypes to improve intra-category compactness and inter-category separation. The experimental results verify the effectiveness of the proposed method.

 

    2. Previous studies ignore the use of knowledge structure in the process of entity representation learning and fact verification, which largely affects the performance of fact checking methods. To this end, this dissertation proposes a knowledge structure enhanced single-claim fact checking method. This method uses domain knowledge structure to enhance entity representation learning and fact verification. On the basis of prototype learning, it enhances the representations of entities by aggregating attribute nodes through graph neural network. Then it designs a merged knowledge structure driven fact verification method on the basis of the existing semantic matching based fact verification. The experimental results verify the effectiveness of the proposed method.

 

    3. Previous fact checking methods only focus on verifying a single claim, however, real world statements are more complex and often expressed in the form of multiple claims. Using existing single-claim fact checking method in the multi-claim fact checking task often lead to erroneous results. To this end, this dissertation proposes the first multi-claim fact checking method based on semantic composition. This method composes the semantics of multiple claims to learn their whole representation for fact verification. It learns the global representation of multiple claims by expanding the receptive field of a graph convolutional network, and meanwhile combines the salient local semantic representation of each claim. The experimental results verify the effectiveness of the proposed method.

 

    4. In order to learn better semantic representation for multiple claims on the basis of semantic composition, this dissertation proposes a fact checking method based on topic information and modeling fine-grained semantic interaction between claims. This method uses topic information of the textual statement to enhance the representation of multiple claims. Using reinforcement learning as the framework, it uses topic and text structure information extracted from textual statement to guide multi-agents to aggregate the coherent claim clusters at entity level. It further utilizes a hierarchical attention mechanism to fuse the representations learned by each agent. The experimental results verify the effectiveness of the proposed method.

Keyword事实核查 增强表示学习 领域知识结构 多断言语义组合 语义交互增强
Subject Area人工智能
MOST Discipline Catalogue工学
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/48515
Collection毕业生_博士学位论文
Recommended Citation
GB/T 7714
王帅. 基于知识图谱的事实核查增强方法研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2022.
Files in This Item:
File Name/Size DocType Version Access License
王帅-论文-答辩后提交版.pdf(4510KB)学位论文 限制开放CC BY-NC-SA
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[王帅]'s Articles
Baidu academic
Similar articles in Baidu academic
[王帅]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[王帅]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.