篇章级实体关系抽取关键技术研究

CASIA OpenIR > 毕业生 > 硕士学位论文

	篇章级实体关系抽取关键技术研究
	许豹
	2023-05-22
页数	80
学位类型	硕士
中文摘要	实体关系抽取 (Entity Relation Extraction) 旨在抽取文本中实体对所蕴含的语义关系，是自然语言处理领域的一个重要研究方向。根据待处理实体对所在文本的文本长度可以划分为句子级实体关系抽取 (Sentence-level Entity Relation Extraction, SERE) 和篇章级实体关系抽取 (Document-level Entity Relation Extraction，DERE)。本文主要研究篇章级实体关系抽取，目标是从包含多个句子的篇章文本中抽取出不同实体之间的语义关系。该任务具有以下特点：1) 篇章文本中包含多个实体，部分实体具有多个实体指称，存在共指现象，实体表示需要联合多个实体指称；2) 篇章文本中的实体分散在文本各处，判断实体关系需要关注文本中的重要节点信息并进行逻辑推理。如何有效利用实体的不同指称以及筛选文本关键节点对于推理实体关系的准确性有着很大影响。现有方法在实体关系抽取过程中对于篇章中实体指称的使用以及关键语句的关注仍然不够充分，使得抽取性能满足不了实际需求。针对上述问题，本文主要工作如下： 1. 基于指代消解的篇章级实体关系抽取方法：篇章文本中的实体通常包含多个实体指称，根据指称类型可以分为名词短语指称和代词指称。现有方法在进行实体表示过程中均只使用名词短语指称而忽略了代词指称。代词指称在实体指称中占有很大比例，使用代词指称表示实体在丰富实体表示的同时也可以缩短实体间距离，从而降低实体关系推理难度，提高模型推理效果。针对这一问题，本文提出基于指代消解的篇章级实体关系抽取方法，利用指针生成网络模型对篇章文本中的名词短语指称和代词指称进行实体对齐，进而使用对齐后的代词指称和名词短语指称共同作为实体表示构建文本图，推理实体关系。本方法在 DocRED 数据集上 F1 值达到 62.46%，超出基准模型 1.42%，达到了最优效果，证明了该方法的有效性。 2. 基于信息瓶颈的篇章级实体关系抽取方法：篇章级实体关系抽取任务中，文本中存在多句子、多实体和多指称现象，在判别实体关系过程中需要联合不同层级信息进行逻辑推理。现有方法使用全部篇章信息作为推理实体关系依据，会因数据冗余产生一定的噪声，干扰实体关系推理效果。针对该问题，本文提出基于信息瓶颈的篇章级实体关系抽取方法，使用实体指称以及句子作为节点构建文本图，在图中节点信息更新过程中使用基于信息瓶颈的图随机注意力方法，让模型自动关注与实体关系判别相关的句子与指称节点，从而降低无关信息对实体关系判别的负面影响，以达到提升模型抽取效果的目标。相较于采用完整图的图注意力方法，本方法在 DocRED 数据集上 F1 值提升 1.85%，证明了该方法的有效性。
英文摘要	Entity relation extraction aims to extract the semantic relation contained in the entity pairs in the text. It is an important research direction in the natural language processing. According to the text length of the entity to be processed, it can be divided into sentence-level entity extraction(SERE) and document-level entity extraction(DERE). In this paper, we mainly research document-level relation extraction. This task requires extracting the semantic relation between different entities from the document containing multiple sentences. There are following characteristics in the task: 1) Document text contains multiple entities, and some entities have multiple mentions which are co-referential. The entity representation needs combine multiple mentions. 2) Entities in document are scattered throughout the text, and judging the relation between entities requires paying attention to important node in the text and logical reasoning. How to effectively utilize different references of entities and filter key nodes in text has a significant impact on the accuracy of inferring entity relationships. The existing methods in the process of entity relationship extraction still do not pay sufficient attention to the use of entity references and key statements in the text, resulting in the extraction performance not meeting practical requirements. In response to the above issues, the main work of this article is as follows: 1.Document-level entity relation extraction based on co-reference resolution: Entities in document text usually contain multiple entity mentions, which can be divided into noun phrase mentions and pronoun mentions according to the type of mention. Existing methods only use noun phrase mentions and ignore pronoun mentions in entity representation. Pronoun mentions account for a large proportion of entity references, and using them to represent entities enriches entity representations while also shortening the distance between entities, thereby reducing the difficulty of entity relationship reasoning and improving the model’s inference performance. In response to this issue, this article proposes a document level entity relationship extraction method based on referential resolution. The pointer generation network model is used to align the noun phrase mentions and pronoun mentions in the text, and then the aligned pronoun and noun phrase mentions are used together as entity representations to construct a text graph to infer entity relationships. This method achieved an F1 value of 62.46% on the DocRED datasets, exceeding the benchmark model by 1.42%, achieving the optimal effect and proving its effectiveness. 2.Document-level entity relation extraction based on information bottleneck: In the task of document-level entity relationship extraction, there are multiple sentences, entities, and mentions in the text. In the process of identifying entity relationships, different levels of information need to be combined for logical reasoning. The existing methods use all textual information as the basis for inferring entity relationships, which can generate certain noise due to data redundancy and interfere with the effectiveness of entity relationship reasoning. In response to this issue, this article proposes a document-level entity relationship extraction method based on information bottleneck, which uses entity mentions and sentences as nodes to construct a text graph. During the update process of node information in the graph, a graph random attention method based on information bottleneck is used to automatically focus on sentences and mentions nodes related to entity relationship discrimination, thereby reducing the negative impact of irrelevant information on entity relationship discrimination, To achieve the goal of improving model extraction efficiency. Compared to the graph attention method using a complete graph, this method improves the F1 value by 1.85% on the DocRED datasets, proving the effectiveness of this method.
关键词	篇章级实体关系抽取指代消解图神经网络信息瓶颈
语种	中文
七大方向——子方向分类	自然语言处理
国重实验室规划方向分类	语音语言处理
是否有论文关联数据集需要存交	否
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/51921
专题	毕业生_硕士学位论文
推荐引用方式 GB/T 7714	许豹. 篇章级实体关系抽取关键技术研究[D],2023.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
篇章级实体关系抽取关键技术研究.pdf（2936KB）	学位论文		限制开放	CC BY-NC-SA