CASIA OpenIR  > 模式识别国家重点实验室  > 自然语言处理
篇章关系识别方法研究与应用
刘洋
Subtype博士
Thesis Advisor宗成庆
2019-05
Degree Grantor中国科学院大学
Place of Conferral自动化研究所
Degree Name工学博士
Degree Discipline模式识别与智能系统
Keyword自然语言理解 篇章关系识别 接地语言学习 基于经验的篇章因果关系识别 基于期望的篇章转折关系识别
Abstract

人们通常将一系列具有语义关联的句子称为篇章。在日常生活中,人们习惯使用篇章表达意图与想法。其中,篇章关系(结构)扮演了一个非常重要的角色,它将篇章中各个部分关联起来,使得这些孤立的语段具有整体和连贯的语义,并直接影响篇章所要表达的意思。想要准确地理解篇章就必须正确地理解篇章关系。此外,研究指出,篇章关系信息也有利于下游自然语言处理任务。因此,篇章关系识别问题无论是从理论上还是从实践上都具有重要的研究价值。

目前,主流的篇章关系识别的研究都采用了基于文本线索的方法。其核心思路是,调查和分析文本线索与篇章关系的统计规律,并构建相应的统计机器学习模型对目标篇章中的篇章关系进行识别。这种方法取得了一定的进展,但也遇到了一系列的问题。研究指出,该方法不能够反映篇章中深层次的语义,也不符合人类进行篇章关系识别时所反映的事实与特点。

根据上述问题,本文的研究工作围绕如何构造合理有效的篇章关系识别模型展开,通过模仿人类篇章关系识别的过程设计和构造篇章关系识别模型。论文的主要贡献与创新归纳如下:


(1)针对汉语隐式篇章关系识别问题,提出了一种基于记忆增强的注意力神经网络模型。该模型引入了注意力机制优化篇章上下文的文本表征,且使用记忆网络缓存在学习过程中捕获的篇章关系上下文模式,进而提升篇章关系分类的效能。实验证明,所提模型在公开的数据集上取得了与最好的模型可比的效果。

(2)针对目前篇章关系识别模型的局限性,提出了一种基于经验的篇章因果关系识别模型。它受人类篇章理解过程的启发,通过积累不同场景中的经验信息以识别篇章因果关系。在实现中,所提模型采用了接地语言学习(grounded language learning)技术框架,可以通过环境直接为模型提供经验。
模型处理文本时,保存在记忆中的经验信息可以对文本的浅层语义进行补充。实验证明,所提模型显著优于传统的基于上下文线索的关系识别模型,而且具有更好的可解释性。

(3)针对转折关系识别问题,提出了一种基于期望比较的转折关系识别模型。该模型模仿人类通过前驱上下文以及经验生成期望,并通过比较期望与后继上下文确定转折关系的行为。
实验证明,所提模型对于已有的基于文本线索的模型具有明显的优势,且具有很好的可解释性。

综上所述,本文致力于构造更为有效的篇章关系识别模型,为篇章关系识别引入了接地语言学习技术框架,并在认知语言学观点的指导下构造了相应的接地篇章关系数据集以及基于经验的篇章关系识别模型,最终展示了这种新方法的潜力与优越性,为该领域提供了新的思路,有力地推动了该领域的研究。

Other Abstract

People often refer to a series of semantically related sentences as discourse. In daily life, people are used to using discourse to express their intentions and ideas. Among them, the discourse relation (structure) plays a very important role, which links the different parts of contexts of the discourse, so that these isolated segments have integral and coherent semantics. The discourse relations can also directly influence the meaning of the discourse. Thus, in order to comprehend the discourse accurately, one must first understand the discourse relation.
In addition, previous studies have pointed out that discourse relation information is also very beneficial to downstream natural language processing tasks. Therefore, the issue of discourse relation recognition has important research value both in theory and in practice.

At present, the mainstream research on discourse relation recognition adopts the methodology based on text clues. The core idea of this methodology is to investigate and analyze the statistical patterns between textual clues and discourse relations, and to construct the statistical machine learning model to identify the discourse relation. This approach has made some progress, but it has also encountered a series of problems. Some studies have pointed out that such an approach neither reflect the inner semantics of the discourse context nor conform to the facts and characteristics of human discourse relation understanding.


According to the above problems, this paper focuses on how to construct a reasonable and effective text relationship recognition model, and designs and constructs a discourse relation recognition model by imitating the process of human. The contributions of this paper are summarized as follows:

(1) For Chinese implicit discourse relation recognition problem, we propose a memory augmented attention-based neural network model. The model introduces the attention mechanism to optimize the  representation of the discourse context, and uses the memory network to cache the context pattern of the discourse relation, thereby improving the efficiency of the discourse relation classification. Experiments have shown that the proposed model has achieved comparable performance to the best models in the public data set. 

(2) Current mainstream models and methods use context information for discourse relation recognition only, then we propose an experience-based discourse causality recognition model which is inspired by the human discourse understanding processing. It first accumulates experience information in different scenes and uses experience information to identify causality in discourse context. In the implementation, we adopt the grounded language learning technology framework which can build experience (of environments) for the model. Therefore, when the model processes the text, the experience information stored in the memory can provide the rich semantic information of the discourse. Experiments show that the proposed model is significantly superior to the traditional text-based recognition model, and the proposed model has better interpretability.

(3) For concession recognition problem, we propose a concession relation recognition model based on expectation comparison. This model intimates the process of human concession understanding in which people usually generate expectations based on the preceding context and experience first, then compare the expectations with the following context to identify the concession relation. Experiments show that the proposed model has obvious advantages over existing text-based models. Further analysis indicates that the proposed model has good interpretability.

In summary, this paper is committed to constructing a more effective discourse relation recognition model. In particular, we introduce a grounded language learning technology framework, and construct corresponding grounded discourse relation data sets and experience-based discourse relation recognition models under the guidance of cognitive linguistics. We demonstrate the potential and superiority of the new approach have provided new ideas for the field and have strongly promoted research in this area.

Pages116
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/23993
Collection模式识别国家重点实验室_自然语言处理
Recommended Citation
GB/T 7714
刘洋. 篇章关系识别方法研究与应用[D]. 自动化研究所. 中国科学院大学,2019.
Files in This Item:
File Name/Size DocType Version Access License
Thesis_submission.pd(5412KB)学位论文 开放获取CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[刘洋]'s Articles
Baidu academic
Similar articles in Baidu academic
[刘洋]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[刘洋]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.