基于文本预训练语言模型的样本级提示学习方法研究

CASIA OpenIR > 多模态人工智能系统全国重点实验室 > 自然语言处理

	基于文本预训练语言模型的样本级提示学习方法研究
	Jin feihu
	2023-05
页数	80
学位类型	硕士
中文摘要	预训练语言模型通过自监督学习的方式在大规模无标注的语料上学习通用的语言表示，在各种自然语言处理任务中表现不俗。当前，预训练语言模型已成为各种自然语言处理任务的基础模型，预训练加微调也成为解决具体自然语言处理任务的主要方法。最近的研究表明模型越大，在下游任务中的表现越好。然而，这种预训练加微调的范式需要为每个下游任务存储一个单独经过全参数优化的模型，存储代价非常昂贵。随着预训练语言模型参数规模的不断增长，从数亿到千亿甚至万亿，预训练语言模型的高效利用成为一个核心挑战。针对上述挑战，本文的研究工作主要从高效利用预训练语言模型方法中的提示学习方法为切入点展开相关研究，论文的主要贡献和创新归纳如下： 1.提出了基于预训练语言模型的样本级提示学习方法提示学习已经成为利用预训练语言模型的一种新范式，并且在预训练语言模型中只增加极小的参数就能在下游任务中够取得良好的效果。当前使用的离散和连续的提示学习方法对于特定任务的提示都是固定的，即任务中的所有样本共享相同的提示。然而，一个任务中样本之间的差异较大，其中一些样本容易处理，而另一些样本难以处理。因此，设计可以充分利用具体样本特性的提示学习方法非常必要。为此，本文提出了一种样本级的提示学习方法，该方法为每个样本学习不同的提示。具体来讲，所提方法假设每个可学习的提示单元对不同的样本有不同的贡献，通过计算样本和每个提示单元之间的相关性得分来得到每个样本的贡献得分。由于本文所学习的每一个提示单元对于不同的样本都有不同的贡献分数，因此最后的提示单元是样本敏感的。所提方法可以应用于自然语言理解和生成任务中，并且在自回归和掩码预训练语言模型中进行了验证。实验结果表明，所提方法仅需微调预训练语言模型约1.5%~3.6%的参数就可以获得与传统全参数微调相当的效果，特别是在少样本学习的基准测试数据集上，所提方法取得了最优效果。 2.提出了结合任务和样本信息的提示学习方法本文提出的样本级提示学习方法为每个样本生成依赖于样本自身的提示，但缺乏任务的通用信息，而传统的任务级提示学习方法为相关任务中的所有样本赋予相同的提示信息，忽略了样本自身的特殊性。为此，本文提出了一种高效的结合任务和样本信息的提示学习方法，可以根据不同的任务和样本特性动态决定任务和样本提示信息的融合程度，从而生成包含任务和样本信息的提示信息。本文在十三个自然语言理解数据集上进行了验证。实验结果表明，在少样本学习场景下，相比现有的提示学习方法，所提方法仅需微调约0.12%的参数就能获得显著的性能提升。同时，所提方法也优于现有最先进的其他参数高效少样本学习方法。
英文摘要	Pre-trained language models use self-supervised learning to acquire universal language representations from large-scale unlabeled corpora, demonstrating excellent performance in various natural language processing tasks. Fine-tuning all parameters of the pre-trained model has become the most common method to adapt it to downstream tasks. This approach has become the dominant paradigm for solving specific natural language processing tasks. Recent studies have consistently shown that larger pre-trained models yield better downstream task performance. However, while the results are impressive, this pre-trained language model and fine-tuning paradigm require expensive storage of a separate fully optimized model for each downstream task. This problem becomes particularly challenging as the number of pre-trained model parameters increases from hundreds of millions to hundreds of billions or even trillions. To address these limitations, this study is to explore related research through the perspective of efficiently utilizing the pre-trained language models with prompt learning methods. The main contributions and innovations of this paper are as follows: 1. Instance-aware Prompt Learning Method Based on Pre-trained Language Model Prompt learning is a new approach that leverages pre-trained language models (PLMs) and has shown promising results in downstream tasks, with minimal increase in parameters. However, current methods assume a fixed prompt for a specific task, which may not be ideal as some tasks may have diverse samples with varying degrees of difficulty, and thus require different prompts. To address this limitation, we propose an instance-aware prompt learning method that learns a different prompt for each instance. Our method assigns a different contribution to each learnable prompt token based on its relevance score to each instance. This way, the prompt is weighted according to the needs of each instance, making it instance-aware. We evaluate our method on both unidirectional and bidirectional PLMs for both language understanding and generation tasks. Our experiments demonstrate that our method achieves comparable results to fine-tuned PLMs with only 1.5%~3.6\% of their parameters, and outperforms strong baselines. Notably, our method achieves state-of-the-art performance on the SuperGLUE few-shot learning benchmark with small language models. 2. A Prompt Learning Approach that Combines Task-specific and Instance-dependent Information This paper proposes an instance-dependent prompt learning method that generates prompts dependent on the instance itself but lacks general information about the task. Traditional task-specific prompt learning methods assign the same prompt information to all instances in the relevant task, ignoring the particularity of individual instances. To address this, this paper proposes an efficient few-shot learning method to dynamically decide the degree to which task-specific and instance-dependent information are incorporated according to different task and instance characteristics, enriching the prompt with task-specific and instance-dependent information. Extensive experiments on a wide range of natural language understanding tasks demonstrate that our approach obtains significant improvements compared to prompt-based and parameter-efficient tuning baselines in a few-shot setting with about 0.12% of parameters tuned. Moreover, our approach outperforms existing state-of-the-art efficient few-shot learning methods on several natural language understanding tasks.
关键词	预训练语言模型提示学习参数高效微调少样本学习
语种	中文
七大方向——子方向分类	自然语言处理
国重实验室规划方向分类	语音语言处理
是否有论文关联数据集需要存交	否
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/52012
专题	多模态人工智能系统全国重点实验室_自然语言处理毕业生_硕士学位论文
推荐引用方式 GB/T 7714	Jin feihu. 基于文本预训练语言模型的样本级提示学习方法研究[D],2023.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
基于文本预训练语言模型的样本级提示学习方（3465KB）	学位论文		开放获取	CC BY-NC-SA