CASIA OpenIR  > 毕业生  > 博士学位论文
事件类常识知识获取与语言模型知识内化关键技术研究
王晨皓
2024-05-14
Pages112
Subtype博士
Abstract

常识是人们理解语言、交流看法、规划行动过程中不可或缺的重要部分。在人工智能领域,赋予机器近似人类的常识能力一直是一项重要的研究主题。过去的几十年中,研究人员围绕如何获取机器可用的常识知识、如何在具体场景中利用常识知识展开了长期探索,形成了一系列代表性知识资源。但是,在一些特定范围,例如事件类常识知识,现有知识资源的规模和覆盖度仍然不足,也缺乏良好的结构化,往往无法满足实际应用需求。与此同时,随着深度学习等技术的进步和预训练语言模型的兴起,常识知识研究正面临着崭新的机遇与挑战。一方面,大规模预训练语言模型在海量语料上学到了大量事实知识、常识知识,这些知识隐含地存在于模型的内在参数之中,有潜力作为常识知识获取的全新来源。另一方面,预训练语言模型正在成为自然语言应用的基础,需要探索实现常识知识与预训练语言模型的结合,将模型所需的外部显式常识知识“内化”到模型之中,以更好地服务下游应用。

在这样的背景下,本文聚焦于事件类常识知识的获取与语言模型知识内化两大主题,深入研究了事件类常识知识的结构化、中文事件类常识知识的自动获取、常识问答中的问答依据生成、模型内在知识的轻量更新四个方面的关键技术。本文的主要成果和创新点总结如下:

1. 基于多元知识融合的结构化事件类常识知识获取

当下的主流常识知识资源使用结构松散的自由形式知识图谱表示,其中的结点为自然语言短语,缺少明确语义结构,并且存在着知识冗余、连接稀疏等问题,和结构化的语言知识、世界知识缺乏联系。为了构建结构化良好的事件类常识知识资源,并加强与语言知识、世界知识的联系,本文提出了一种以语义框架为核心的多层知识体系,将事件类常识知识与语言知识、世界知识进行链接融合,以获取结构化的事件类常识知识。具体而言,首先对事件类常识知识中涉及的事件短语进行框架语义解析,得到更加明确的语义结构;在此基础上,融合事件类常识知识资源中的同义结点,将其链接到语言知识资源中定义的框架分类体系上;以框架作为媒介,进一步建立起与世界知识实例之间的联系,形成多层次知识的融合。最终,本文构建并公开了一个多元知识图谱CogNet,以一千多个语义框架作为核心,串联起由常识知识图谱中的事件结点转化而来的八万多个受限框架、近三十万条事件类常识知识陈述以及三千多万个来自世界知识资源的事件实例,支持通过统一的查询引擎进行多元知识的访问和利用。

2. 基于预训练语言模型的中文事件类常识知识获取

由于常识知识获取难、成本高,现有常识知识资源的规模和覆盖度仍然十分有限,主要集中在概念类常识知识,并且存在英语中心化的现象。针对中文事件类常识知识自动获取的问题,本文提出了一种基于预训练语言模型的中文事件类常识知识获取方法,具体而言,通过多步提示学习诱导语言模型生成常识知识,并结合按类生成、级联过滤、自举迭代等策略,在预训练语言模型能力有限的情形下改善所获取知识的质量和多样性。最终,基于本文提出的方法,实现了中文事件类常识知识的大规模自动获取,构建并公开了一个中文常识知识图谱CN-AutoMIC,其中高质量子集具有一百一十万个常识知识三元组,抽样评估正确率87.2%,并且捕获了中文使用群体特有常识知识。

3. 面向常识问答依据生成的语言模型知识内化

常识问答任务是常识知识最典型的应用和评测手段。然而在常识问答任务中,自动补充有用的常识知识作为问答依据并不容易。已有方法往往受限于繁琐的检索处理过程或者依赖于不易调整的大尺寸语言模型。针对常识问答中自动获取问答依据这一具体需求,本文提出一种语言模型知识内化方法,在合成数据上训练问答依据生成模型。具体而言,首先利用符号化常识知识库合成问答数据,并诱导大型语言模型补充特定于问题的问答依据,再将合成的问答依据数据作为“练习材料”,训练小规模的问答依据生成模型。通过这种方式,使得该模型既学到了丰富的常识知识,也学到了如何针对问题给出有用的问答依据。五个常识问答评测数据集上的实验结果表明,该模型不仅能在零样本条件下显著增强多种问答模型的表现,也能在有真实任务训练数据的条件下根据问答模型反馈进一步优化问答依据生成的效果。

4. 面向轻量知识更新的语言模型知识内化

语言模型正在逐渐成为知识的新兴载体,但在实际应用中,语言模型内在的事实知识、常识知识可能会出现错漏、过时的情况,为了避免完全从头训练模型的高昂成本,需要研究如何对模型的内在知识进行轻量更新,由此诞生了模型编辑这一研究领域。然而,已有的模型编辑研究主要局限在基于三元组的编辑形式,而现实中的新知识往往以自由文本的形式出现。针对更一般条件下的模型内在知识轻量更新问题,本文提出了一个以自由文本作为编辑请求的模型编辑评测基准。该基准以新出现的事件和实体描述文本作为编辑请求,提供了多种泛化等级的知识探测问题,能够用于深入分析语言模型内化新知识的效果。受该评测基准上的实验现象启发,本文提出了一种针对自由文本的轻量知识更新方法,结合语言模型天然具有的上下文学习能力,利用上下文蒸馏实现模型内在知识的有效更新。实验结果表明,相较于其它模型编辑方法,该方法在高泛化等级的探测问题上取得了更好的表现。

Other Abstract

Commonsense is an essential part of people's understanding of language, communication of views, and planning of actions. In the field of artificial intelligence, endowing machines with human-like commonsense abilities has always been an important research topic. Over the past few decades, researchers have embarked on a long-term exploration of how to acquire commonsense knowledge usable by machines and how to utilize such knowledge in specific scenarios, resulting in a series of representative knowledge resources. However, in some specific areas, such as eventuality commonsense knowledge, the scale and coverage of existing knowledge resources are still insufficient, and they lack good semantic normalization, often failing to meet the needs of practical applications. Meanwhile, with the advancement of technologies such as deep learning and pre-trained language models, commonsense knowledge research is facing new opportunities and challenges. On the one hand, large language models have learned a vast amount of factual and commonsense knowledge on massive corpora, which implicitly exists within the models' parameters and has the potential to serve as a new source of commonsense knowledge acquisition. On the other hand, as pre-trained language models are becoming the foundation of natural language applications, there is a need to explore the integration of commonsense knowledge with pre-trained language models, making the models internalize the required knowledge and better serve downstream applications.

In this context, this paper focuses on two main themes: acquiring eventuality commonsense knowledge and internalizing knowledge within language models, and addresses four specific research questions: semantic normalization of eventuality commonsense knowledge, automatic acquisition of Chinese eventuality commonsense knowledge, generation of rationales in commonsense question answering, and lightweight updates of model-intrinsic knowledge. Specifically, the main work and innovations of this paper are summarized as follows:

 

1. Acquisition of Semantically Normalized Eventuality Commonsense Knowledge Based on the Integration of Multiple Knowledge Resources

Current mainstream commonsense knowledge resources are represented in free-form knowledge graphs, often lacking in semantic hierarchy and containing a large amount of repetitive and redundant expressions. To further build semantically normalized commonsense knowledge resources and strengthen their connections with language and world knowledge, this paper proposes a multi-layer knowledge system centered around semantic frames, and links representative language, world, and commonsense knowledge resources. Specifically, phrases involved in eventuality commonsense knowledge are first converted to clearer semantic structures through frame semantic parsing. Then, synonymous nodes in eventuality commonsense knowledge resources are merged and linked to the frame taxonomy. Using semantic frames as a medium, links with world knowledge instances are further established. Finally, this paper proposed a multi-knowledge graph CogNet. It has over a thousand semantic frames at its core, and integrates over eighty thousand frames with element restrictions, nearly three hundred thousand commonsense knowledge statements, and over thirty million event and entity knowledge instances, supporting the access and utilization of multiple knowledge through a unified query engine.

 

2. Acquisition of Chinese Eventuality Commonsense Knowledge Based on Pretrained Language Models

Due to the difficulty and high cost of acquiring commonsense knowledge, the scale and coverage of existing commonsense knowledge resources remain very limited, mainly concentrated on conceptual commonsense knowledge, and exhibit an English-centric issue. To expand the scale of available eventuality commonsense knowledge and capture the unique commonsense knowledge of the Chinese community, this paper proposes a method for acquiring Chinese eventuality commonsense knowledge by generating with pretrained language models. However, languages other than English lack advanced pretrained language models, making it difficult to ensure the quality and diversity of the generated results. To address these challenges, this paper introduces mechanisms such as generation-by-category, cascade filtering, and bootstrapping iteration to improve generation efficiency, quality and diversity under the constraints of pretrained language model capabilities. Finally, based on the proposed approach, this paper proposes a Chinese commonsense knowledge graph CN-AutoMIC. Its high-quality subset contains 1.1 million commonsense knowledge triples, achieving a sample evaluation accuracy rate of 87.2%.

 

3. Knowledge Internalization in Language Models Aimed at Generating Rationale for Commonsense Question Answering

The commonsense question answering task is one of the most typical applications and evaluation methods for commonsense knowledge. However, it is non-trivial to automatically acquire useful commonsense knowledge as a rationale for commonsense question answering. Existing methods are often limited by cumbersome retrieval processes or depend on large, difficult-to-adjust language models. To address the need for automatic rationale generation in commonsense question answering and fill the gaps of previous methods, this paper proposes a knowledge internalization framework to train a rationale-generation model on synthetic data. Specifically, the framework first utilizes symbolic commonsense knowledge bases to synthesize question-answering data, and then induces large language models to augment question-specific rationales. After that, a small-scale rationale-generation model is trained on the synthesized data. It enables the model to learn rich commonsense knowledge and the ability to provide useful rationales for question-answering. Empirical results on five commonsense question answering benchmarks show that the model not only significantly enhances the performance of various question answering models under zero-shot conditions, but also has potential to be further improved based on feedback from the question answering models under conditions if task-specific training data are available.

 

4. Knowledge Internalization in Language Models Aimed at Lightweight Knowledge Updating

Language models are gradually becoming a new carrier of knowledge. But in practical applications, the intrinsic knowledge of language models may become incorrect, missing, or outdated. To avoid the high cost of completely retraining models, it is necessary to study how to lightly update the model's intrinsic knowledge. Therefore, the field of model editing emerges. However, existing research is mainly limited to the task format with triple-based editing. To study the lightweight knowledge updating under more general conditions, this paper proposes a multi-level benchmarkd for free-text model editing. It uses free-form descriptions of newly emerged events and entities as editing requests. It also provides a variety of knowledge probes at different generalization levels, allowing for a deep analysis of the effectiveness of existing model editing methods. Inspired by the experimental phenomena, this paper proposes a method based on in-context distillation. Compared to other model editing methods, this approach performs better on probes of higher generatlization levels.

Keyword常识知识 常识知识获取 语言模型知识萃取 语言模型知识内化 常识问答
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/57405
Collection毕业生_博士学位论文
Recommended Citation
GB/T 7714
王晨皓. 事件类常识知识获取与语言模型知识内化关键技术研究[D],2024.
Files in This Item:
File Name/Size DocType Version Access License
王晨皓 博士毕业论文最终版.pdf(5599KB)学位论文 限制开放CC BY-NC-SA
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[王晨皓]'s Articles
Baidu academic
Similar articles in Baidu academic
[王晨皓]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[王晨皓]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.