面向标注数据稀疏场景的任务型对话技术研究
刘庆斌
2021-11
页数152
学位类型博士
中文摘要

对话系统是一种计算机应用系统,它能够让机器通过自然语言与人类进行交流。在学术界中,对话系统是人工智能和自然语言处理领域的重要研究课题,它能够赋予机器智能,以通过图灵测试;在工业界中,对话系统已经得到了广泛应用,它在智能手机助手、客服系统等业务场景中存在着巨大的商业价值。任务型对话系统是一种被深入研究的对话系统,它旨在通过人机对话帮助用户完成特定的任务,如订餐、查询天气或导航等。

近年来,随着数据驱动的深度学习技术的进步,任务型对话系统得到了长足发展。常见的流水线式任务型对话系统由对话意图检测、对话状态跟踪、对话策略学习和对话回复生成四个部分组成,由于结构的复杂性,这类对话系统需要大量的高质量标注数据;此外,任务型对话系统具有领域限定性,不同领域的标注数据差异较大,旧领域的标注数据往往不能迁移到新领域,因此,在向新领域扩展时,仍然需要标注大量的数据。然而,数据标注的成本很高并且费时费力,这已经成为制约任务型对话系统发展的主要问题之一。因此,本文研究面向标注数据稀疏场景的任务型对话系统。任务型对话系统的不同部分往往涉及不同的标注数据稀疏场景,本文重点关注了增量学习场景、零样本学习场景和无监督学习场景等标注数据稀疏场景。本文的主要研究内容和创新点如下:

1. 提出一种面向增量学习场景的对话意图检测和对话状态跟踪方法对话意图检测和对话状态跟踪是流水线式任务型对话系统的两个重要模块。传统的对话意图检测模型通常采用离线训练范式,它需要预定义一个固定的意图类别集合。然而,在真实世界中,在线的对话系统需要处理不断出现的新意图。因此,本文研究增量意图检测任务,它在新数据上持续不断地训练模型,以增量式地学习新出现的意图,同时避免在旧数据上出现性能下降问题。然而,在 增量意图检测任务中,传统的增量学习方法难以处理新旧数据不平衡问题。因此,本文提出了一种新的增量学习方法,它利用层级知识蒸馏和类间余量损失解 决数据不平衡问题。实验结果证明了该方法的有效性。

此外,传统的对话状态跟踪模型通常也采用离线训练范式,它需要一个固定的数据集。然而,在线的对话系统通常需要处理不断出现的新领域和新数据。因此,本文研究基于领域增量学习的对话状态跟踪任务,它在新数据上持续训练模型,以学习不断出现的新领域,同时保持在旧领域上的性能。为了处理该任务,本文提出了一种新的增量学习方法,叫作知识保留网络,它通过多原型增强的回顾和多策略知识蒸馏,解决表达多样性和组合爆炸问题。实验结果表明,该方法有效地缓解了旧领域上的性能下降问题,取得了最佳的性能。

2. 提出一种面向零样本学习场景的对话状态跟踪方法在对话状态跟踪任务中,一个对话状态通常由多个槽值对组成。然而,现实世界中的值的种类往往非常多并且会随时间动态变化,因此,很多值没有对应的训练数据,也叫作未登录值。研究这种零样本学习场景下的对话状态跟踪方法可以提升对话系统理解未见过的用户需求的能力。之前的生成式对话状态跟踪方法从对话历史中抽取文本片段作为值,以处理未登录值。然而,有些对话不包含值的显式的表达,这使得之前的生成式方法难以抽取出相应的文本片段。此外,它们通过规则反标对话中的文本片段作为标注信息,然而,规则的不完备性会造成漏标问题,限制了模型对正确值的学习。针对上述问题,本文提出了一种统一策略,它融合在预定义值上的分类机制和在对话历史中的复制机制两种手段, 同时处理未登录值和不可抽取问题。此外,本文利用基于语义关系的强化学习方法缓解漏标问题,它将对话文本片段和预定义槽值对之间的语义关系作为一种 弱监督信息。实验结果表明,该方法能够有效地处理零样本学习场景下的对话状态跟踪任务。

3. 提出一种面向无监督学习场景的端到端任务型对话系统流水线式任务型对话系统由多个独立的模块组成,为了减轻设计和维护这些模块的成本,很多研究人员开始研究端到端任务型对话系统。常见的端到端任务型对话系统可以分为基于单任务框架的方法和基于多任务框架的方法两种。基于单任务框架的方法将对话历史和知识库作为输入,通过一个端到端模型直接输出系统回复。这类方法隐式地无监督地学习对话子任务,不再需要中间模块的标注数据。该任务的主要挑战是如何有效地将外部知识库融入学习框架中。 然而,之前的方法通常忽略了知识库中的图结构信息以及知识库和对话历史之间的相关信息。此外,不同种类的目标实体之间的训练数据往往是不平衡的,这限制了困难目标实体的学习。针对上述两个问题,本文提出了基于异构关系图神经网络与自适应目标的端到端任务型对话系统。其中,异构关系图神经网络联合编码对话历史和知识库,以捕捉图结构信息。本文提出了一种自适应目标,它在训练过程中动态地调整不同实体的学习权重以处理实体不平衡问题。实验结果表明,所提方法能有效地生成富含知识的回复,优于之前的方法。

此外,基于多任务框架的方法通过一个端到端模型处理多个对话子任务,如对话状态跟踪和模板生成等。之前的方法往往采用有监督范式学习对话状态跟踪,需要大量的标注数据。然而,数据标注的成本很高,并且费时费力。为了解决这个问题,本文提出了一个多文本段预测网络,以在基于多任务框架的方法中进行无监督的对话状态跟踪,减轻模型对标注数据的依赖。具体来说,本文提出了一种分割-合并复制机制,该机制通过建模对话历史和回复之间的依赖关系自动地生成关键词。本文还设计了一种基于语义距离的聚类方法从这些关键词中获得每个槽的值。此外,本文提出了基于本体的强化学习方法,它利用聚类得到的值训练模型以生成准确的对话状态。实验结果表明,该方法相比于之前的无监督方法有显著提升。本文还构建了一个医疗领域的中文对话数据集,在此数据集上的实验进一步验证了本文方法的适应性。

英文摘要

Dialogue systems are a kind of computer application system that provides a natural way for human-machine communication. In academia, dialogue systems are an important research topic in artificial intelligence and natural language processing, which can empower machines with intelligence to pass the Turing test. In industry, dialogue systems have been widely used, and they have shown great commercial value in many scenarios such as smartphone assistants and customer service systems. Task-oriented dialogue systems are a type of dialogue system that has been widely studied. They aim at helping users accomplish specific tasks, such as reserving restaurants, checking the weather, or navigating.

Recently, with the advancement of data-driven deep learning technology, task-oriented dialogue systems have made great progress. A common pipelined task-oriented dialogue system consists of four components: dialogue intent detection, dialogue state tracking, dialogue policy learning, and dialogue response generation. Due to the complexity of the structure, such dialogue systems require a large amount of high-quality labeled data. In addition, task-oriented dialogue systems are domain-limited, and the labeled data in different domains differ greatly. The labeled data from old domains often cannot be migrated to new domains. Therefore, when expanding to new domains, they still require a large amount of labeled data. However, data annotation is costly and time-consuming, which limits the development of task-oriented dialogue systems. Therefore, this thesis studies the methods of task-oriented dialogue for labeled data sparse scenarios. Different parts of task-oriented dialogue systems often involve different labeled data sparse scenarios. This thesis mainly focuses on incremental learning scenarios, zero-shot learning scenarios, and unsupervised learning scenarios. The main research contents and innovations of this thesis are as follows:

1. Incremental learning for dialogue intent detection and dialogue state tracking Dialogue intent detection and dialogue state tracking are two important components of pipelined task-oriented dialogue systems. Traditional dialogue intent detection models usually adopt an offline learning paradigm, which can only handle fixed predefined intent classes. However, in real-world applications, online dialogue systems usually need to deal with continually emerging new intents. Therefore, this thesis proposes the incremental intent detection task, which continually trains the model on new data to incrementally learn emerging intents while avoiding performance degradation on old data. However, traditional incremental learning methods suffer from the data imbalance problem between old data and new data. Therefore, this thesis proposes a new incremental learning method, which consists of hierarchical knowledge distillation and inter-class margin loss to solve the data imbalance problem. Experimental results demonstrate the effectiveness of the method.

Traditional dialogue state tracking models usually are trained offline, which requires a fixed dataset. However, the offline paradigm is impractical in the real world, as online dialogue systems usually need to deal with continually emerging new domains and new data. Therefore, this thesis studies domain-incremental learning for dialogue state tracking, which continually trains the model on new data to learn continually emerging new domains while avoiding performance degradation on old domains. To handle this task, this thesis proposes a new domain-incremental learning method, called knowledge preservation networks, which addresses the expression diversity and combinatorial explosion problems through multi-prototype enhanced retrospection and multi-strategy knowledge distillation. Experimental results show that this method effectively alleviates the problem of performance degradation on old domains and achieves the best performance.

2. Zero-shot learning for dialogue state tracking In the dialogue state tracking task, a dialogue state usually consists of multiple slot-value pairs. However, real-world values are often very diverse and dynamically change over time, so there is no corresponding training data for many values. We call these values unknown values. The study of zero-shot learning for dialogue state tracking can improve the ability of dialogue systems to understand the unseen user requirements. Previous generative dialogue state tracking methods extract text spans from the dialoguehistory as values to handle unknown values. However, some dialogues do not contain explicit expressions of values, making previous generative approaches unable to extract the corresponding text spans. In addition, they use rules to label text spans in dialogues as annotations. The incompleteness of the rules results in unlabeled instances, which limit the learning of correct values. To address these two problems, this thesis proposes a unified strategy that incorporates both a classification mechanism on predefined values and a copy mechanism in the dialogue history to deal with both the unknown values and the non-extractable problem. Meanwhile, this thesis alleviates the problem of missing labels through a reinforcement learning method based on semantic relations, which utilizes the semantic relations between text spans and predefined slot-value pairs as weakly supervised information. Experimental results show that the method can effectively handle the dialogue state tracking task in a zero-shot learning scenario.

3. Unsupervised learning for end-to-end task-oriented dialogue systems Pipelined task-oriented dialogue systems consist of several independent modules. To reduce the cost of designing and maintaining these modules, many researchers have started to study end-to-end task-oriented dialogue systems. Common end-to-end task-oriented dialogue systems can be divided into two categories: approaches based on a single-task framework and approaches based on a multi-task framework. The approaches based on a single-task framework take the dialogue history and the knowledge base as input and outputs responses directly through an end-to-end model. Such approaches learn dialogue sub-tasks implicitly and in an unsupervised manner, eliminating the need for labeled data of intermediate modules. The main challenge of this task is how to effectively incorporate external knowledge bases into the learning framework. However, previous methods usually ignore the graph structure information in the knowledge base and the relevant information between the knowledge base and the dialogue history. In addition, they ignore the entity imbalance problem, which limits the learning of difficult target entities. To address these two problems, this thesis proposes heterogeneous relational graph neural networks with adaptive objective for end-to-end task-oriented dialogue systems. The heterogeneous relational graph neural networks jointly encode the knowledge base and the dialogue history to capture graph structureinformation. In this thesis, we propose an adaptive objective, which dynamically adjusts the learning weights of different entities during the training process to address the entity imbalance problem. Experimental results show that the proposed method can effectively generate knowledge-rich responses and significantly outperforms previous methods.

The approaches based on a multi-task framework handle multiple dialogue subtasks, such as dialogue state tracking and template generation, through an end-to-end model. Previous approaches usually adopt a supervised paradigm to learn dialogue state tracking, which requires a large amount of labeled data. However, data annotation is costly and time-consuming. To address this problem, this thesis proposes a multi-span prediction network for unsupervised dialogue state tracking in end-to-end task-oriented dialogue systems. The proposed method can alleviate the reliance of the model on labeled data. Specifically, this thesis proposes a split-merge copy mechanism that automatically generates keywords by modeling the dependencies between dialogue histories and responses. This thesis also proposes a semantic distance based clustering method to obtain the values of each slot from these keywords. In addition, this thesis proposes an ontology-based reinforcement learning approach, which uses values and raw dialogues to learn dialogue state tracking. Experimental results show that the method achieves significant improvement over previous unsupervised methods. In addition, this thesis constructed a new Chinese dialogue dataset in the medical domain, and experiments on this dataset further validate the adaptability of the proposed method.

关键词自然语言处理 任务型对话系统 标注数据稀疏
语种中文
七大方向——子方向分类自然语言处理
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/46636
专题多模态人工智能系统全国重点实验室_自然语言处理
通讯作者刘庆斌
推荐引用方式
GB/T 7714
刘庆斌. 面向标注数据稀疏场景的任务型对话技术研究[D]. 中国科学院自动化研究所. 中国科学院大学学位评定委员会,2021.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
刘庆斌-博士论文.pdf(5059KB)学位论文 开放获取CC BY-NC-SA
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[刘庆斌]的文章
百度学术
百度学术中相似的文章
[刘庆斌]的文章
必应学术
必应学术中相似的文章
[刘庆斌]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。