动态知识图谱的表示学习与推理研究

CASIA OpenIR > 毕业生 > 博士学位论文

	动态知识图谱的表示学习与推理研究
	李明达
	2022-08
页数	120
学位类型	博士
中文摘要	知识图谱是大数据时代一种有效的知识表示与存储工具，被广泛应用于智能问答、个性化推荐、事件预测等领域。作为推动机器实现认知智能的重要引擎，知识图谱现已成为人工智能领域的热门研究课题。当前所使用的知识图谱大多是稀疏的、不完备的，存在大量链接缺失的现象，这降低了它们在实际应用中的价值。为了缓解知识图谱数据稀疏性的问题，研究人员尝试利用表示学习方法来得到知识的嵌入表示，通过在低维空间中的向量计算，挖掘知识图谱中的隐藏关联，进而推理得到其中缺失的链接信息。得益于其计算效率高、鲁棒性好等优点，该类方法正受到学术界和工业界的广泛关注。然而，当前流行的表示学习方法大多是针对静态知识图谱设计的，其假定知识是长期不变的。显而易见，这项假设过于理想化。考虑到现实世界具有一定的动态演化属性，其中部分知识只在特定时期有效。为了更为有效地模拟动态场景，知识图谱往往具有以下结构特征：1）节点之间固定的链接结构：用以建模客观存在的事实知识；2）未知节点的不断引入：用以表示不断出现的新增知识；3）节点之间关系的不断变化：用以体现随时间演化的时序知识。因此，为了更好地满足动态知识图谱的知识表示和推理需求，现有表示学习方法亟需突破以下三个瓶颈：1）更好地建模知识图谱中的事实知识，并提升模型的推理精度，增加推理结果的可解释性；2）高效地获取新增实体的表示，并推理得到与该实体有关的缺失信息；3）有效地刻画知识图谱中的实体在不同时刻的语义关联，进而提升面向时序知识的推理性能。本文在深入调研和分析现有知识图谱表示学习方法的基础上，探究如何挖掘和利用动态场景下知识图谱中的实体和关系的交互规律，进而开展以下三个创新性研究工作： 1）本文针对事实知识的推理任务，提出了一种融合显式特征的表示学习方法。该方法将从知识图谱中检索得到的显式特征融合进嵌入模型中，得到更具表达能力的知识图谱嵌入表示，进而提升了模型的推理精度：首先，通过局部封闭世界假设，构造知识图谱中描述链接特性的显式特征，有效表征实体和关系的链接偏好；在此基础上，在表示学习的得分函数中引入得到的显式特征，使得嵌入模型可以有效捕获实体和关系的链接特性；最后，设计了一种软间隔排序损失以更好地区分正负三元组样本对模型优化的影响，增加嵌入表示中蕴含的语义信息。在四个收录事实知识的公开知识图谱数据集上进行链接预测和实体分类实验，结果表明所提方法针对事实知识的推理效果优于当前主流的表示学习方法，且具有突出的长尾实体处理能力，并能在一定程度上增加推理结果的可解释性。 2）本文针对新增实体的推理任务，提出了一种基于图注意力网络的新增实体归纳式表示学习方法。该方法通过注意力机制对新增实体的邻居信息和相似实体进行加权聚合，得到更加鲁棒的新增实体表示：首先，提出了一种面向实体在图中上下文结构信息的相似性度量方法，可以有效获取新增实体的相似实体；其次，设计了一种引入关系查询的图注意力网络，通过对新增实体的邻居信息和相似实体进行加权聚合，实现新增实体的鲁棒表示；最后，在推理阶段，该方法仅依赖事先训练得到的图注意力网络便可归纳得到新增实体的嵌入表示，并将其用于推理，实现面向新增实体的高效链接预测。在两个公开的以及六个自建的面向新增实体推理的数据集上进行链接预测实验，结果表明所提方法可以利用有限的链接信息高效地归纳得到新增实体的表示，进而获得理想的针对新增实体的推理效果。 3）本文针对时序知识的推理任务，提出了一种基于时间敏感关系约束的表示学习框架。该框架通过挖掘关系对所连实体在不同时刻的语义约束特征，提升了现有表示学习方法针对时序知识的推理能力：首先借助当前流行的变换函数（张量分解和超平面映射），针对性地设计了两类时间敏感的关系约束项，实现了对关系和所连实体在不同时刻语义约束特征的有效捕获；其次，提出了一种嵌入正则化项用于约束知识图谱嵌入表示，可有效预防模型训练过程中出现过拟合现象；在此基础上，提出了一种时序正则化项用于约束知识图谱中时间信息的嵌入表示，使之满足时间平滑性约束。在四个公开的时序知识图谱上进行链接预测实验，结果表明所提框架可以有效提升当前主流表示学习方法的推理性能。
英文摘要	In the era of big data, the knowledge graph has become an effective tool of knowledge representation and storage, which is widely used in fields such as question answering, individualized recommendations and event forecasting. As an important force to empower the machine with cognitive intelligence, the knowledge graph has become a hot research topic in the field of artificial intelligence. Existing knowledge graphs are usually sparse where a large number of triples are missing, and the incompleteness of knowledge graphs has hindered their applications. To tackle this issue, many representation learning methods have been proposed, which embed the knowledge into low-dimensional vector spaces. By complex vector operations, these methods can exploit the interactions implied in the knowledge graph, thus contributing to inferring the missing facts. Due to their efficiency and robustness, these methods have recently attracted more attention. However, most of existing representation learning methods are designed for static knowledge graphs, which assume facts are universally true. It is obvious that this assumption is unrealistic. Considering that the real world usually has the dynamic nature, some facts may only hold at a certain time. In order to simulate the dynamic scenarios more effectively, knowledge graphs usually have the following structural features: i) the fixed links between nodes that can model the factual knowledge; ii) the emerging of unseen nodes that can represent the new knowledge; iii) the changing of relations between nodes that can reflect the temporal knowledge. Thus, to meet the knowledge representation and reasoning requirements in dynamic knowledge graphs, existing representation learning methods should tackle the following issues: i) how to better model the factual knowledge, making the inference process more precise and explainable; ii) how to efficiently obtain the representations of newly emerging entities when inferring the missing facts about these unseen entities; iii) how to effectively characterize the semantic interactions between entities at different time steps, resulting in a better reasoning performance. With the in-depth investigation and analysis of existing representation learning methods, this thesis explores to exploit and leverage the interactions between entities and relations at various time steps, and carries out the following three research work about the representation learning methods and reasoning for the dynamic knowledge graph: 1. For prediction about the factual knowledge, a novel representation learning method is proposed, which incorporates the observed features into the embedding model, making the inference process more precise and explainable. Specifically, the method introduces the observed features according to the local closed-world assumption, which can indicate the connection preferences between entities and relations. On this basis, this method elaborately designs the score function by incorporating the observed features into the embedding space to capture above connection preferences between entities and relations. Furthermore, this method proposes a soft margin-based ranking loss that characterizes different semantic distances between negative and positive samples with the observed features, thus generating a more semantics-specific model. Extensive experiments have been implemented on four public datasets. The results on link prediction and entity classification show that the proposed method achieves a better performance against the state-of-the-art methods. Moreover, it can make the inference process more explainable and achieve a substantial improvement when dealing with long-tail entities. 2. For prediction about newly emerging entities, a novel inductive representation learning method is proposed, which applies the graph attention network to aggregate the neighborhood surrounding the newly emerging entity and its similarity information, resulting in a more robust embedding. Specifically, the method designs a similarity-aware function, which measures the distance of each entity pair based on the graph contextual gap. In addition, this method characterizes the influence of the neighborhood surrounding each target entity and its similarity information by query-specific attention weights, resulting in a more robust embedding of the target entity. During evaluation, this method can efficiently obtain the embedding by aggregating the neighborhood along with the similar counterparts in the learned aggregation network. Experiments have been conducted on two public and six self-built datasets. The results indicate that the proposed method can efficiently obtain the embeddings of newly emerging entities with limited facts, achieving a better performance on the task of newly emerging entities reasoning. 3. For prediction about the temporal knowledge, a novel time-aware relational constraint-based framework is proposed, which exploits the semantic property implied in the temporal knowledge graph to improve the performance of existing representation learning methods. Specifically, to leverage the semantic property between the relation and its involved entities at various time steps, the framework proposes two variants of the time-aware relational constraints based on tensor decomposition and hyperplane projection. Additionally, in terms of specific constraints, this framework incorporates a suitable embedding regularizer to tackle the overfitting problem. On this basis, this framework presents a temporal regularizer to enforce temporal smoothness. Experimental studies on four public temporal knowledge graphs show that the proposal can effectively improve the performance of the widespread representation learning methods.
关键词	动态知识图谱知识表示学习新增实体知识推理链接预测
语种	中文
七大方向——子方向分类	知识表示与推理
国重实验室规划方向分类	可解释人工智能
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/49942
专题	毕业生_博士学位论文
通讯作者	李明达
推荐引用方式 GB/T 7714	李明达. 动态知识图谱的表示学习与推理研究[D],2022.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
动态知识图谱的表示学习与推理研究_李明达（2238KB）	学位论文		限制开放	CC BY-NC-SA