面向知识库的实体关系语义映射技术研究

CASIA OpenIR > 毕业生 > 博士学位论文

	面向知识库的实体关系语义映射技术研究
其他题名	Research on Mapping Entity Relation Mentions to Semantic Items in Knowledge Base
	刘芳
	2014-05-29
学位类型	工学博士
中文摘要	实体关系的语义映射是将自然语言文本中的关系指称项与知识库中的属性关系建立语义关联的技术，是大规模知识库构建、语义搜索等应用的重要支撑技术之一。随着开放式信息抽取研究的逐渐深入以及知识库资源的不断丰富，关系映射成为自然语言处理领域的研究热点。在对关系指称项进行语义映射时，有两个难点问题：一个是关系语言表达的多样性往往导致关系指称项和属性关系间表述不一致；另一个是关系短语的歧义性经常导致同一关系指称项可以指向知识库中的多种目标语义。本文主要针对关系映射中的多样性问题展开研究，同时对歧义性问题也进行了探索，主要工作和创新点归纳如下： 1、融合实体对信息和关系名称变形扩展的关系映射方法因为知识库中的属性描述已经提供了很多语义关系，因此本文借助已有知识库中预先定义好的属性名，将关系指称项的映射到知识库中相应的属性名上，从而实现语义推断。由于关系指称项在表达目标语义时的各种变形以及巧合匹配等问题，需要关系映射算法既能捕获不同表述下的潜在语义，又要能处理各种变化带来的干扰。在处理这个问题时，现有方法只关注构成关系语义的两个实体，即根据这对实体来判断它们共现时所表达的关系语义。本文认为，关系指称项的语义并不仅仅依赖于与其关联的那对实体，还依赖于表达关系的关系指称项本身。据此，本文提出了融合实体对信息和关系名称变形扩展的关系映射方法。具体地，首先对知识库中的属性关系进行同义扩展，并将扩展后的同义词集合与关系指称项进行语义匹配，把这个结果通过Stacking技术与实体对信息相融合来实现关系映射。在PATTY数据集上的实验结果显示，关系映射的平均正确率可以达到0.744，在性能上比现有依赖实例的方法提升了0.245。 2、关系映射的生成式模型现有方法进行关系映射时，通常将知识库中具有特定属性关系的实体对回标到文本中，并利用数据冗余信息进行关系语义推断。这样的假设只利用了与关系指称项共现的实体对信息，而忽视了其他信息，因此效果并不理想。基于以上分析，本文提出了关系映射中三个重要特征：语义关系本身的流行度、实体对对语义关系的指示度以及关系指称项与实体对的共现度，并通过一个生成式模型将上述三个特征融合在一起进行关系语义映射。实验数据采用了从维基正文中抽取出来的开放关系三元组和维基信息框中的属性关系三元组。实验结果显示，该方法关系映射的平均正确率可以达到0.88±0.02。 3、基于图算法的关系映射由于关系语言表达的多样性和关系歧义问题的存在，融合尽可能多的不同角度的特征对映射性能非常关键。然而，现有的关系映射方法对映射中的资源的利用并不充分。本文发现以下特征对关系映射非常重要。它们是：关系指称项和属性关系的共享实例、实例是否可以拥有多个属性关系、关系指称项间的实例重合度以及关系指称项和属性关系间存在的相似度等。同时，这些特征在关系映射时相互影响。为了将以上特征组织在一起，并建模它们之间的影响，本文提出了基于图算法的关系映射方法，该方法通过构造关系指称项-实例二部图，将属性关系作为标签赋给相对应的实体对和关系指称...
英文摘要	Semantic mapping of entity relation mentions aims to link the relation mentions in natural language texts to the attribute relations in Knowledge Base. It is a key supporting technique for many applications, e.g., konwledge base population, semantic retrieval etc. Benefitting from the fundamental research of open information extraction and the population of konwledge base, relation mapping has been a hot issue in the natural languae processing research area. There are two main difficulties in relation mapping. One is the mismatch between relation mentions and attributes since the relation mentions are always various. The other is the ambiguity of relation mentions because a relation mention may refer to different attributes whlle there may be multiple attributes between an entity pair. This thesis makes an intensive study on the technique of mapping entity relation mentions to semantic items in Knowledge Base. The main contributions and innovative points are summarized as follows. 1. Relation Mapping Based on Instances and Attribute Names Semantics Expansion Because of the problems of relation mention variations and coincidental matches, relation mapping algorithms are required to capture the semantics behind the various expressions while immuning the noise. To solve this problem, existing methods rely on the information of entity pairs, i.e., they assume that if a relation mention share much more entity pairs with attribute, then the mention is more likely to express the semantics of that attribute. We believe that the relation mapping relies on not only the entity pairs information but also the relation mentions themselves. Therefore, we propose a relation mapping method which combines entity pairs and attribute name semantics expansion. We first expand attribute candidates with their synsets and then match the semantics similarity between the elements in the synsets with relation mentions. The matching results are combined with entity pairs by stacking technique to achieve the goal of relation mapping. Experimental results demonstrate that the average accuracy of our method can achieve 0.744 for relation mapping on PATTY dataset, which improves current methods rely on entity pairs by 0.245. 2. Generative Model for Relation Mapping Open Information Extraction (Open IE) could extract domain-independent relational triples from natural language texts which keep the variety and richness of the language. However, open IE didn’t recognize the exact semant...
关键词	关系语义推断关系映射开放式信息抽取知识库 Semantic Inference Of Relation Mentions Relation Mapping Open Information Extraction Knowledge Base
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6642
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	刘芳. 面向知识库的实体关系语义映射技术研究[D]. 中国科学院自动化研究所. 中国科学院大学,2014.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20101801462804（2161KB）			暂不开放	CC BY-NC-SA