CASIA OpenIR  > 毕业生  > 博士学位论文
基于解释增强的预训练语言模型知识利用关键技术研究
杨朝
2024-05-14
Pages122
Subtype博士
Abstract

知识作为人工智能技术的基础设施,对于语义理解和深度推理具有关键支撑作用。从人工智能概念出现伊始,研究者们一直致力于构建符号化知识库,分别从语言、实体、事件、规则、常识等不同侧面刻画和构建不同类型的知识库。近些年,预训练语言模型技术飞速发展,通过在海量数据上的预训练,预训练模型内部也已经习得大量参数化数值表示的知识。在当前以数据驱动为核心的人工智能时代,如何在神经网络模型中有效利用知识(包括符号化知识和参数化知识)是当前自然语言处理的核心难点之一。

本论文主要面向预训练语言模型中的知识利用问题,分别从知识迁移、知识筛选、知识激活三个具体任务开展研究。知识迁移任务是将一个模型的参数化知识迁移到另一个模型中来,知识筛选任务指的是从注入预训练语言模型的符号化知识中筛选出对模型推理有帮助的知识,而知识激活任务则是激活模型内化的参数化知识来增强模型在相应下游任务上的性能。针对上述任务,已有方法主要依据模型输入、输出两端数据约束模型的学习和优化过程,进而实现不同表示形式的知识的有效利用。然而,由于神经网络模型本身是一个包含海量参数的黑盒模型,上述知识利用策略仅仅实现了对于输入、输出数据的拟合,忽略了知识在模型当中的运行过程和运行方式,导致已有知识利用方法普遍存在性能低、不鲁棒、泛化差的问题。

针对这一问题,本论文尝试从可解释性角度出发,以问答、文本分类等具体自然语言处理任务为验证手段,探究基于解释增强的预训练语言模型知识利用方法。通过解释方法探究预训练语言模型在不同知识利用任务中的运行机制,根据解释结果提出改进方案来纠正其中的不合理行为,进而提升预训练语言模型在下游任务上的性能。本文主要研究内容和创新点总结如下:

1.基于解释增强的预训练语言模型参数化知识迁移方法

在预训练语言模型的知识迁移中,一个核心难题在于判断现有知识蒸馏方法是否能够完整地将教师模型的参数化知识迁移到学生模型。本研究从可解释性的视角出发,运用解释技术来分析学生模型与教师模型推理过程的相似性。研究发现在已有知识迁移方法中,学生模型虽然能够拟合教师模型的输出分布,但并未真正掌握教师模型的运行方式。这说明了学生模型在域内测试中表现良好,而在更具挑战性的域外测试等方面出现性能下降的原因。针对这一问题,本研究提出了一种基于解释指导的知识蒸馏框架。该框架整合了现有的三种主要解释方法,用来揭示模型的推理方式,并对其中计算复杂度较高的解释方法进行了效率优化,从而提高了整个框架的运行效率。值得一提的是,该框架不受限于模型架构相似性的要求,能够实现不同架构模型之间的知识迁移。在通用自然语言理解评测集上的实验结果验证了所提出方法的有效性。

2.基于信息瓶颈的预训练语言模型外部符号化知识筛选方法

在知识增强的预训练语言模型的研究中,一个关键的挑战在于如何从外部大规模的符号化知识中筛选出对模型推理过程真正有益的知识。由于注入的外部知识往往数量庞大且包含大量冗余,传统知识注入方法可能并未真正有效地提升模型性能。本研究从可解释性的视角出发,发现大部分外部知识并未对推理过程产生实质性帮助。相反,注入少量真正有用的知识,可以进一步提升模型的性能。这一发现突出了知识筛选的重要性。受自解释框架先选择再预测机制的启发,结合互信息对知识图谱此类图结构数据间关联的良好度量,本研究提出了基于信息瓶颈的知识筛选方法。该方法先识别有效的知识,再根据这些知识进行预测。具体来说,通过知识筛选模块选择外部知识,利用正确答案对所选知识打分,结合端到端训练方式使得知识筛选模块能够筛选出支持正确答案的外部知识。同时,该方法针对常识推理任务提出优化目标,并针对优化目标中互信息项不可直接优化的问题,运用变分推断方法推导出可优化的上界便于模型训练。本研究所提出的知识筛选模块轻量且与模型无关,可以集成到任何知识增强模型中。在常识推理任务的多个数据集上进行实验,结果表明向模型仅注入筛选之后的知识在多个知识增强模型上均带来了进一步的性能提升,证明本文所提出的知识筛选模块能够有效去除冗余知识。

3.基于行列式点过程的生成式大语言模型参数化知识激活方法

生成式大语言模型通过指令语句或少量示例,就能够在不更新模型参数的前提下激活并利用模型内部的参数化知识,上下文学习就是其中一种有效的知识激活技术。然而,上下文学习中不同示例的激活效果存在显著差异(从随机猜测到超越全数据微调的预训练语言模型都有可能),因此选择能够有效激活模型知识的示例成为大语言模型的知识激活任务的关键研究问题。为了解决这一问题,本研究提出了一种基于行列式点过程的代表性示例选择方法。首先,基于样本级别归因解释方法,提出了影响得分指标,用以衡量上下文学习中一个示例对其他示例贡献的大小,并基于此评估示例的质量。同时,考虑到上下文学习通常涉及多个示例,本研究引入了多样性指标,并进一步细分为语义多样性和影响多样性。为了综合考虑样本质量、语义多样性和影响多样性这三个关键因素,本研究提出了一种基于两阶段行列式点过程的示例选择方法。在第一阶段,基于行列式点过程选择具有语义多样性的样本;在第二阶段,基于第一阶段的结果,进一步选择高质量且影响多样的样本作为最终示例。在多个生成式大型语言模型和多个自然语言理解任务上的实验结果显示,所提出的示例选择方法能够更有效地激活模型对任务的知识,显著提升上下文学习性能。

Other Abstract

Knowledge serves as the foundational infrastructure of artificial intelligence technology, playing a pivotal role in semantic understanding and deep reasoning. Since the emergence of the concept of artificial intelligence, researchers have been dedicated to constructing symbolic knowledge bases, characterizing and building different types of knowledge repositories from various aspects such as language, entities, events, rules, and commonsense. In recent years, pre-trained language models have rapidly advanced, acquiring a substantial amount of parameterized numerical representation knowledge through pre-training on massive corpora. In the current data-driven era of artificial intelligence, the effective utilization of knowledge (including symbolic and parameterized knowledge) within neural network models is one of the core challenges in natural language processing.

This dissertation primarily addresses the issue of knowledge utilization within pre-trained language models, conducting research through three specific tasks: knowledge transfer, knowledge selection, and knowledge activation. Knowledge transfer involves migrating the parameterized knowledge from one model to another. Knowledge selection refers to selecting knowledge that aids in model reasoning from the injected symbolic knowledge into the pre-trained language model. Knowledge activation aims to activate the internalized parameterized knowledge to enhance the model's performance on corresponding downstream tasks. Existing methods mainly constrain the model's learning and optimization process based on the input and output data to achieve effective utilization of knowledge in different representation forms. However, due to the nature of neural network models as black-box models with a massive number of parameters, these knowledge utilization methods merely achieve fitting of the input and output data, disregarding the processes and mechanisms through which knowledge operates within the model. As a result, existing knowledge utilization methods generally suffer from issues of low performance, lack of robustness, and poor generalization.

In response to this problem, this dissertation attempts to explore knowledge utilization methods in pre-trained language models from an explainability perspective, using specific natural language processing tasks such as question answering and text classification as validation means. This study aims to understand the operational mechanisms of pre-trained language models in different knowledge utilization tasks through explanation methods. Based on the explanation results, this study proposes improvement techniques to correct unreasonable behaviors observed in the models, thereby enhancing the performance of pre-trained language models on downstream tasks. The main research contents and innovations of this paper are summarized as follows:

1. Explanation-enhanced parameterized knowledge transfer method for pre-trained language models

In the knowledge transfer of pre-trained language models, a fundamental challenge lies in determining whether existing knowledge distillation methods can completely transfer the parameterized knowledge from the teacher model to the student model. From the perspective of interpretability, this study uses explanation methods to analyze the similarity between the reasoning processes of the student and teacher models. The study reveals that while existing knowledge transfer methods enable the student model to fit the output distribution of the teacher model, they do not fully grasp the reasoning mechanisms of the teacher model. This explains why the student model performs well in in-domain tests but experiences a decline in performance in more challenging out-of-domain tests. To address this issue, this study proposes an explanation-guided knowledge distillation framework. The framework integrates three existing explanation methods to reveal the reasoning process of the model and optimizes the efficiency of explanation methods with high computational complexity, thereby improving the overall efficiency of the framework. Notably, the framework is not limited by the requirement of model architecture similarity and can achieve knowledge transfer between models with different architectures. The experimental results on the General Language Understanding Evaluation (GLUE) benchmark validate the effectiveness of the proposed method.

2. Information bottleneck-based external symbolic knowledge selection method for pre-trained language models

A key challenge in the research of knowledge-augmented pre-trained language models is how to select the truly beneficial knowledge from the massive external symbolic knowledge. Traditional knowledge injection methods may not effectively improve model performance due to the vast amount of injected external knowledge, which often contains significant redundancy.  From the perspective of interpretability, this study finds that most external knowledge does not substantially aid the reasoning process. In contrast, injecting a small amount of truly useful knowledge can further improve model performance. This highlights the importance of knowledge selection. Inspired by the select-then-predict mechanism of the self-explaining frameworks and leveraging mutual information for measuring the relationships between graph-structured data such as knowledge graphs, this research proposes an information bottleneck based knowledge selection method. This method aims to identify useful knowledge first and then make predictions. Specifically, the knowledge selection module selects external knowledge and measures the quality of the selected knowledge via correct answers. The end-to-end training framework enables the knowledge selection module to select the knowledge that could support the correct answers. Furthermore, this method introduces optimization objectives for commonsense reasoning tasks and addresses intractable mutual information terms in the objectives by employing variational inference to derive tractable upper bounds for model training. The proposed knowledge selection module in this research is lightweight and model-agnostic, capable of being integrated into any knowledge-augmented model. Experimental results on multiple datasets for commonsense reasoning tasks demonstrate that injecting the selected knowledge into various knowledge-augmented models leads to further performance improvements, validating that the knowledge selection module effectively removes redundant knowledge.

3. Determinantal point process-based parameterized knowledge activation method for generative large language model

Generative large language models can activate and utilize the internal parameterized knowledge without updating model parameters through instructions or a few demonstrations. And in-context learning is one effective technique for knowledge activation.  However, there are significant differences in the activation effects of different demonstrations in in-context learning (ranging from random guessing to surpassing fully fine-tuned pre-trained language models). Therefore, selecting demonstrations that can effectively activate model knowledge becomes a key research problem in knowledge activation for large language models. To solve this problem, this study proposes a representative demonstration selection method based on the determinantal point process. First, based on instance-level explanation methods, this study proposes an influence score metric to measure the contribution of an example to other examples in in-context learning and assess the quality of the sample. At the same time, considering that in-context learning usually involves multiple demonstrations, this study introduces a diversity metric and further divides it into semantic diversity and influence diversity. To comprehensively consider these three key factors of demonstration quality, semantic diversity, and influence diversity, this study proposes a demonstration selection method with a two-stage determinantal point process. In the first stage, examples with semantic diversity are selected based on the determinantal point process. In the second stage, leveraging the results from the first stage, high-quality examples with diverse influences are further selected as the final demonstrations. Experimental results on multiple generative large language models and various natural language understanding tasks show that the proposed demonstration selection method can effectively activate model knowledge for the tasks, significantly improving the performance of in-context learning.

Keyword预训练语言模型 可解释性 知识迁移 知识筛选 知识激活
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/56674
Collection毕业生_博士学位论文
Recommended Citation
GB/T 7714
杨朝. 基于解释增强的预训练语言模型知识利用关键技术研究[D],2024.
Files in This Item:
File Name/Size DocType Version Access License
杨朝-基于解释增强的预训练语言模型知识利(2885KB)学位论文 限制开放CC BY-NC-SA
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[杨朝]'s Articles
Baidu academic
Similar articles in Baidu academic
[杨朝]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[杨朝]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.