CASIA OpenIR  > 毕业生  > 博士学位论文
(1) 本文提出了一种基于循环神经网路和二维卷积神经网络的主题抽取模型。文本主题概括了文本的主要语义信息,而抽取文本语义的主题信息,将有助于抽取特定领域的知识和定义关系类型体系,也是知识抽取工作的基础。针对这一任务,本文设计了一种基于长短期记忆网络和二维卷积神经网络的主题挖掘模型,该模型能有效地挖掘句子的语义信息。在多个相关任务上进行了实验,结果表明所提模型在主题抽取方面具有明显的优势。
(4) 本文提出了一种基于混合神经网络的实体和关系的关联抽取方法。传统串联知识抽取方法往往将实体识别和关系抽取当作两个独立的任务对待,忽视了二者之间的联系,而已有的联合抽取方法又多是基于人工特征工程,耗时耗力且鲁棒性差。针对上述问题,本文提出了一种基于长短期记忆网络和卷积神经网络的实体与关系的关联抽取模型,该方法不仅避免了人工设计特征的过度参与,而且增强了实体抽取和关系抽取的关联性。在信息抽取的常用数据集CoNLL04上的实验结果表明,本章所提方法显著提高了实体和关系抽取的效果。
With the rapid development of Internet, the text data in the network increases dramatically. The massive text data contains a wealth of knowledge, which can support the development of a variety of intelligent applications, but it can bring a huge redundant information, which makes it difficult for people to find the information they want. We need to discover knowledge from large-scale text data and convert the knowledge to something that the computer can understand. Knowledge extraction aims to solve this problem.
In this thesis, we focus on the problem of extracting knowledge from unstructured texts on the premise of predefined relation set. In order to define the relation set and preprocess the extracted text, we firstly study the technology of topic extraction. The three different kinds of knowledge extraction methods were proposed, they are: the supervised relation extraction method, the distant supervised relation extraction method and the joint knowledge extraction method. The main achievements of this thesis are shown as follows:
Firstly, a hybrid neural network model is proposed for topic extraction. The topic within text contains the main semantic information of the text. Extracting the topic information from text is helpful to define the relation set and extract the specific domain knowledge. We propose a neural model based on two-dimensional convolution and two-dimensional pooling to extract topic information and this model is able to capture the semantic information in the text. We conduct experiments on six public dataset and the experimental results show that the proposed method outperforms most of the state-of-the-art methods.
Secondly, a neural network based on attention mechanism is proposed for supervised relation extraction. Relation extraction is to identify the relationship of two given entities in the text, which is an important step of pipelined knowledge extraction method. The main weakness of most existing methods is that most features are explicitly derived from Natural Language Processing (NLP) tools, the errors generated by NLP tools would propagate in these methods and these features constructed on one domain could not utilized by another domain. Another weakness is that most existing methods treat all words in the text as the same important and ignore the fact the keywords are more crucial for the relation than other words in the text. Based on the above analysis, we propose a novel neural network based on attention mechanism to extract relation without any handcrafted features. Experimental results on the SemEval-2010 relation classification task show that the proposed method only with word embeddings outperforms most of existing methods. 
Thirdly, a hierarchical selective attention mechanism based neural network is proposed for distant supervised relation extraction. In relation extraction, one challenge that is faced when building a machine learning system is the generation of training examples, which is time-consuming when labelling text. The distant supervised relation extraction methods suffer from the wrong label problem. A sentence that mentions two entities does not necessary express their relation. To solve these problems, we propose a hierarchical attention mechanism based neural network, which does not rely on annotated text. We conduct experiments on as widely used dataset and the experimental results demonstrate that the proposed method performs significantly better than most of existing methods. 
Finally, a hybrid neural network is proposed to jointly extract entities and their relations. Traditional pipelined knowledge extraction methods treat the task as two separated tasks, i.e., named entity recognition and relation extraction. They neglect the relevance of these two subtasks. Besides, most of existing joint methods are feature based structured systems, which need complicated feature engineering and heavily rely on the supervised NLP tools. We propose a hybrid neural network to jointly extract entities and their relations without using any handcrafted features. Experimental results on the CoNLL04 dataset demonstrate that the proposed model using only word embedding as input achieves state-of-the-art performance.
关键词知识抽取 长短期记忆网络 卷积神经网络 二维卷积神经网络 注意力机制
GB/T 7714
周鹏. 面向非结构化文本的知识抽取关键技术研究[D]. 北京. 中国科学院研究生院,2018.
文件名称/大小 文献类型 版本类型 开放类型 使用许可
template_签名页.pdf(5845KB)学位论文 限制开放CC BY-NC-SA
所有评论 (0)
