CASIA OpenIR  > 毕业生  > 博士学位论文
融合知识的生成式对话系统关键技术研究
刘操
Subtype博士
Thesis Advisor赵军
2020-05-31
Degree Grantor中国科学院大学
Place of Conferral中国科学院自动化研究所
Degree Name工学博士
Degree Discipline计算机应用技术
Keyword自然语言生成,对话系统,知识融合
Abstract

对话系统旨在让计算机用自然语言与人类交流,自图灵测试提出以来,它就是人工智能和自然语言处理领域的重要研究任务。随着对话系统在精准营销、智能教育、情感陪护等方面的商业价值逐步挖掘,各大互联网巨头也纷纷布局对话系统,如亚马逊的Echo、微软小冰、百度度秘等。

近年来,随着深度学习的发展,数据驱动的生成式方法已经成为对话系统的研究趋势。然而,这种数据驱动的对话生成方法特别容易生成通用回复,如“我不知道”、“我也一样”、“抱歉”等等。由于缺乏知识建模,目前对话系统难以提供有内容的回复,无法满足用户的信息化需求,已经成为制约对话系统发展的主要瓶颈。在对话系统中引入知识,可以有效缓解上述问题。因此,本文研究融合知识的生成式对话系统关键技术,并重点关注世界知识、词汇知识、情境知识和语境知识等知识类型。本文的主要研究内容和创新点如下:

1. 提出了一种融合世界知识的答案生成方法

对话系统缺少对事实知识的建模,容易得到表达流利却缺乏知识的答案。为此,本文研究了融合世界知识的自然答案生成模型。针对主流基于深度学习的自然语言生成模型难以有效融合符号表示的世界知识的问题,本文提出了一种整合检索与拷贝机制的自然答案生成模型,可以利用知识库检索、问句拷贝、词表预测等不同手段,获取不同类型的词汇用于自然答案生成。实验表明,该方法可以有效融合符号表示的世界知识,生成更为准确的答案。

此外,针对上述模型用于学习的数据质量参差不齐,高质量数据有限,低质量数据又容易引入噪声的问题,本文还提出了一种基于课程学习的自然答案生成模型。该模型先从低质量的、简单的训练数据中学出一个基本模型,再从高质量的、复杂的训练数据中学习出一个更好的模型。实验表明,该模型可以充分利用参差不齐的训练数据,进一步提高自然答案生成的性能。

2. 提出了一种基于世界知识的问句生成方法

现有知识问答(对话)缺少高质量的学习数据。一种解决策略是从给定的知识中生成问句。然而,传统方法生成的问句难以表达给定的知识,并且生成的问句容易对应模棱两可的答案。为此,本文研究了基于世界知识的问句生成方法,提出了一种融合多样化上下文与答案辅助监督的问句生成模型。该模型利用知识库现存的资源扩展给定知识,丰富给定知识的上下文;此外,考虑到人工标注的问句可能对应多个歧义性答案,因此除了以人工标注的问句作为主要监督,本文还利用答案作为辅助监督,使得生成的问句对应更加确定的答案。实验结果表明所提模型在问句生成的多个评价上均优于基线系统,可以有效地自动构建基于世界知识的问答(对话)数据集。

3. 提出了一种融合词汇知识的回复生成方法

除了世界知识,对话回复还离不开词汇知识的指导。因此,本文研究了融合词汇知识的对话回复生成问题。针对传统方法利用单个词表和单步解码捕获信息有限的问题,本文提出了词汇表金字塔网络:一种基于多层次词汇表的多步编解码回复生成模型。该模型通过层次聚类将单一词汇表扩展成多层次词汇表,聚类过程中,词表规模依次减小,呈现“金字塔”结构,这样多层次词汇表模拟了同义词、上下位词关系,再利用多步编解码模型作用于不同层次的词汇表,融合得到最终解码序列。中英文数据集的对话回复生成实验证明了所提模型的有效性,相比传统方法的单个词表与单步解码,该模型可以在多层次词汇表上捕获更为丰富的编解码信息。

4. 提出了一种融合情境知识和语境知识的回复生成方法

除了世界知识和词汇知识,对话回复还依赖情境知识和语境知识,本文也研究了融合情境知识和语境知识的对话回复生成。现有给定人物描述场景的回复生成模型忽略了人物描述之间的关联,并且各种人物描述采用相同解码器,难以生成个性化回复。为此,本文提出了一种基于给定人物描述场景的个性化回复生成模型。该模型利用的多视角图不但可以建模人物描述与对话历史的关系,也能捕获不同人物描述语句之间的关联。此外,该模型还利用人物自适应解码器,可以针对不同场景构建动态参数,进而生成更加个性化的回复。该模型在公开数据上的自动评价与人工评价均优于基线系统,表明所提模型可以在对话系统中有效感知人物描述场景。

上述方法基于给定人物描述场景的情境,然而大部分对话没有这种人物描述,此外目前缺乏广泛存在的多人对话回复生成的研究。为此,本文将对话生成扩展到多人对话,提出了一种融合上下文人物背景的多人对话回复生成模型。该模型从对话上下文的语境中建模对话者人物信息,如当前的说话者、受话者以及第三方观察者,这样不再依赖给定的人物描述。此外,考虑到当前受话者的重要性,本文提出了一种受话者的记忆机制,可以强化对话者的上下文信息。最后在已有数据基础上构建了多人对话生成的数据集。实验表明该方法相对基线系统取得了更好的结果,可以在多人对话生成中有效捕获上下文的人物背景信息。

Other Abstract

The dialogue system is a computer system dedicated to communicating with humans, and it has been an important research task of artificial intelligence and natural language processing since the Turing Test was proposed. In recent years, due to the great commercial value of dialogue systems in precision marketing, intelligent education, emotional support and so on, many Internet giants have deployed dialogue systems such as Amazon's Echo, Microsoft's XiaoIce, Baidu's Duer.

Recently, with the development of deep learning, data-driven generative methods have become the research trend of dialogue systems. However, such methods are prone to generate safe responses such as “I don't know”, “Me too” and “Sorry”. Current dialogue systems are difficult to produce responses with substantial content and meet users' information demands due to the lack of modeling knowledge, and it has become the main bottleneck of dialogue systems. Fortunately, incorporating knowledge into dialogue systems could alleviate the issue of safe responses. Therefore, this thesis studies the key technologies of knowledge-fused generative dialogue systems, and it focuses on knowledge types such as world knowledge, lexical knowledge, situational knowledge and contextual knowledge. The main research contents and innovations of this thesis are shown as follows:

1. Incorporating world knowledge into answer generation

The current dialogue system, without modeling factual knowledge, is prone to generate fluent answers in expression but lacking knowledge. This thesis incorporates world knowledge into answer generation. The deep learning-based natural language generation models are hard to exploit symbolic world knowledge. In this end, this thesis proposes to incorporate copy and retrieval mechanisms into answer generation. Specifically, it leverages three modes to obtain different semantic units including copying from the source question, retrieving from the knowledge base and predicting from the vocabulary list. Experiments demonstrate that the proposed model can fuse symbolic world knowledge effectively, and then generate more correct answers.

Moreover, the training data utilized in the previous model is uneven in quality. The low-quality data is prone to bring noise while the size of high-quality data is too small. Towards this issue, this thesis presents curriculum learning for natural answer generation. Firstly, it learns a basic model from low-quality and simple learning data, then it learns a better model from high-quality and complex leaning data. Experimental results demonstrate that the proposed model can make full use of uneven-quality learning data, and further improve the performance of natural language generation.

2. Incorporating world knowledge into question generation

Existing knowledge-based question answering (dialogue) systems lack high-quality learning data. One solution is to generate questions from given knowledge. However, these questions generated by conventional methods are hard to express the given knowledge and prone to refer to ambiguous answers. In this end, this thesis studies the task of question generation based on world knowledge, and then it proposes to incorporate diversified contexts and answer-aware supervision into question generation. Firstly, this thesis leverages diversified contexts to enrich the given knowledge. Furthermore, besides the question-aware supervision, the answer-aware supervision is introduced to make generated questions corresponding to more definitive answers. Experiments demonstrate that the proposed model obtains obvious improvement compared with existing methods, and it can effectively build knowledge-based dataset for question answering or dialogue systems.

3. Incorporating lexical knowledge into response generation

Besides world knowledge, dialogue responses need the guidance of lexical knowledge. Therefore, this thesis incorporates lexical knowledge into dialogue response generation. Traditional methods capture limited information through a single vocabulary list and one-step decoding. In this end, this thesis proposes a vocabulary pyramid network which can incorporate multi-pass encoding and decoding with multi-level vocabularies into response generation. This model expands a single vocabulary to a multi-level vocabulary through hierarchical clustering. In the multi-level vocabulary, the size of the vocabulary is decreased in order and it looks like a “pyramid” concerning the vocabulary size. In this way, the multi-level vocabulary simulates the synonyms and hypernym. Moreover, this thesis proposes multi-pass encoder and decoder working on multi-level vocabularies. Experiments on Chinese and English datasets prove the effectiveness of the proposed model, and the proposed model can capture richer encoding and decoding information from the multi-level vocabularies.

4. Incorporating situational knowledge and contextual knowledge into response generation

Besides world knowledge and lexical knowledge, dialogue responses rely on situational knowledge and contextual knowledge. This thesis also incorporates situational knowledge and contextual knowledge into response generation.  Existing response generation models based on given persona scene suffer from two important issues: (1) they ignore the relation among personas; (2) they utilize the same decoder for various personas. Such methods can not capture much persona information and generate personalized responses well. This thesis proposes a persona-adaptive network to incorporate personas into response generation dynamically, where a multi-perspective graph can not only model the relation among input messages and personas but also capture the interaction among different personas. Furthermore, this thesis introduces a persona-adaptive decoder whose parameters in a customized decoder are dynamically built for different scenes to generate more personalized responses. Experiments on an open-public dataset demonstrate that the proposed model achieves better performance compared with baselines on automatic and manual evaluations, and it indicates that the proposed model can effectively perceive persona scenes in dialogue systems.

The previous model is based on the situational knowledge by giving persona description scenarios. However, there isn't such a persona description on most dialogue, and it lacks researches of response generation on multi-party conversations. In this end, this thesis extends dialogue response generation to multi-party conversations and proposes a multi-party response generation model that incorporates interlocutor-aware contextual background. It can capture personal background from dialogue contexts rather than given persona description, where the contextual interlocutor background includes information of speaker, addressee and observers. Moreover, this thesis leverages an addressee memory to enhance contextual interlocutor information. Finally, this thesis constructs a corpus for this task based on an existing open-access dataset. Experiments demonstrate that the proposed model remarkably outperforms strong baselines, and it can capture interlocutor background from dialogue contexts, effectively.

Pages144
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/39198
Collection毕业生_博士学位论文
Recommended Citation
GB/T 7714
刘操. 融合知识的生成式对话系统关键技术研究[D]. 中国科学院自动化研究所. 中国科学院大学,2020.
Files in This Item:
File Name/Size DocType Version Access License
博士学位论文-刘操.pdf(4466KB)学位论文 限制开放CC BY-NC-SA
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[刘操]'s Articles
Baidu academic
Similar articles in Baidu academic
[刘操]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[刘操]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.