CASIA OpenIR  > 毕业生  > 博士学位论文
面向对话文本的自动摘要关键技术研究
林海涛
2023-05-26
页数106
学位类型博士
中文摘要

随着互联网和移动技术的日益发展,人们通过对话交换信息的频率越来越高。当对话轮次较多时,读者需要耗费大量的时间阅读整段对话,从而理解其讨论的主要内容。自动摘要技术可以归纳文本中的关键内容,使读者能够更加快速地获取信息。然而,相较于一般文本,面向对话文本的摘要存在明显的不同,主要表现在对话由多个角色的语句交互组成、对话中主题的转变频繁、关键信息分散在对话的各个位置等。这些特点导致面向一般文本的摘要方法难以应用于对话场景。因此,本文聚焦于面向对话文本的自动摘要方法研究,从数据层面和模型层面研究如何生成更高质量的对话摘要。论文的主要创新点和贡献归纳如下:

1. 构建了一个较大规模的中文细粒度对话摘要数据集

现有的对话摘要数据集大多只包含整段对话的摘要内容,并且基本上都是英文数据,极大程度上限制了对话摘要的应用场景。针对这一问题,本文构建了一个包含细粒度标注的中文对话摘要数据集。考虑到对话文本中存在多角色、多主题的问题,该数据集为对话中的每个角色标注了各自的观点摘要内容,且每个摘要都按主题进行了划分,相应的摘要被称作角色粒度的对话摘要和主题粒度的对话摘要。基于该数据集,本文比较了现有的有监督与无监督摘要方法的性能表现。特别地,针对于现有无监督方法难以抽取对话关键内容的问题,本文提出了一种基于对话语句生成难度的无监督对话摘要方法。该方法利用对话生成模型度量不同上文对语句生成的影响,并以此建模对话语句的相关性和信息丰富性,从而抽取对话中的关键语句作为摘要。实验表明,与现有的无监督方法相比,该方法可以显著提升摘要的质量,并且具有良好的鲁棒性,其抽取的关键语句与人类标注的关键语句具有高度的一致性。此外,本文还指出了现有方法在该数据集上存在的问题与挑战,为后续的方法研究奠定基础。

2. 提出了一种基于角色交互的角色粒度对话摘要方法

在对话过程中,不同的角色通常都会频繁地进行交互。已有的对话摘要方法在生成面向某一角色的对话摘要时较少考虑到其他角色所提供的信息。针对这一问题,本文提出了一种基于角色交互的角色粒度对话摘要方法。该方法包含两个角色交互模块,利用不同角色之间信息的相关性和互补性,从对话内容和摘要内容两个方面抽取其他角色的信息,用于辅助生成针对某一角色的摘要内容。实验表明,该方法在多个数据集上均显著优于已有的最好方法,且可适用于多种摘要模型,较好地缓解了现有方法生成的摘要语义不完整的问题。

3. 提出了一种基于主题辅助任务的主题粒度对话摘要方法

随着对话轮次数量的增加,对话主题可能发生变化。在实际应用中,读者有时只关心对话中某个主题相关的内容。针对这种情况,本文提出了主题粒度的对话摘要任务,旨在生成对话中某主题下的摘要内容。为解决该任务,本文提出了一种基于主题辅助任务的主题粒度对话摘要方法。该方法利用三种与主题相关的辅助任务:对话主题识别任务、主题语句注意力限制任务、主题摘要区分任务,目的是更加准确地建模对话中的主题变化,提高生成的摘要与主题的相关性。实验表明,该方法相较于已有的最好方法显著提升了主题粒度的对话摘要质量,在对话结构复杂的情况下能更加准确地输出与主题相关的摘要内容。

英文摘要

With the development of internet and mobile technology, people increasingly exchange information through dialogues. When dialogues become lengthy, it can be time-consuming and challenging for readers to comprehend the entire content and identify the key points. The automatic summarization technique can effectively compress the critical content in the text and enable readers to access information quickly. However, summarizing dialogues is significantly different from summarizing general texts. For example, a dialogue involves interactions between multiple roles, topics frequently change in the dialogue, and key information is scattered throughout the dialogue. These features make general text summarization methods difficult to apply to dialogue scenarios. Therefore, this paper focuses on the research of automatic dialogue summarization methods from both the data level and model level, aiming to generate dialogue summaries with higher quality. The main innovations and contributions of this paper are summarized as follows: 

1. Constructing a Large-Scale Chinese Fine-Grained Dialogue Summarization Dataset

Most existing dialogue summarization datasets only contain summaries for the entire dialogue, and nearly all of them are in English, which greatly limits the application scenarios of dialogue summarization. To address this issue, this paper constructs a fine-grained Chinese dialogue summarization dataset that can be applied to multiple scenarios. Considering that a dialogue contains multiple roles and topics, each dialogue in this dataset is annotated with role-oriented summaries and each summary is divided based on different topics. Based on this dataset, this paper compares the performance of existing supervised and unsupervised summarization methods on generating different kinds of summaries. Specially, considering that existing unsupervised methods perform poor on extracting dialogue key utterances, this paper proposes an unsupervised dialogue summarization method based on utterance generation difficulty. This method uses a dialogue response generation model to assess the influence of utterance generation based on different contexts, adopts the result to measure the relevance and informativeness of each utterance and extract key utterances as summaries. Experimental results have shown that the proposed method can bring obvious improvements than other existing unsupervised methods. Besides, it also exhibits good robustness and the extracted utterances are highly consistent with human annotations. Additionally, this paper points out the problems and challenges of existing methods, which lays the solid foundation for future research in this area.

2. Proposing a Role-Oriented Dialogue Summarization Method based on Role Interactions

Frequent interactions usually occur between different roles in a dialogue. Existing methods do not take into account enough information from other roles when generating the summary for a given role. To solve this problem, this paper proposes a role-oriented dialogue summarization method based on role interactions. This method contains two role interaction modules, and leverages the relevance and complementarity of information from different roles to extract information from both dialogue content and summary content aspects. The extracted information can be utilized to generate the summary for a given role. Experimental results have shown that the proposed method outperforms the state-of-the-art method significantly and is adaptable to multiple summarization models. It can effectively alleviates the problem of incomplete summaries generated by existing methods.

3. Proposing a Topic-Oriented Dialogue Summarization Method based on Topic-Related Auxiliary Tasks

As the number of dialogue turns increases, the discussion topics in the dialogue could change frequently. In practical application, readers may only be interested in the dialogue content related to a single topic. Focused on this scenario, this paper proposes the topic-oriented dialogue summarization task, which aims to generate the summary of dialogue related to a given topic. To solve this task, this paper proposes a topic-oriented dialogue summarization method based on topic-related auxiliary tasks. This method employs three topic-related auxiliary tasks, including dialogue topic identification task, topic-related utterance attention restriction task, and topic-oriented summary distinguishing task. These tasks aim to model topic changes in the dialogue more accurately and improve the relevance between generated summaries and the given topic. Experiments have shown that, compared with the the state-of-the-art method, the topic-oriented summary quality can be greatly enhanced by the proposed method. It can output the summary content related to the given topic more successfully when the dialogue structure is more complicated.

关键词对话摘要 数据标注 无监督方法 角色交互 多任务学习
学科领域自然语言处理
学科门类工学
语种中文
七大方向——子方向分类自然语言处理
国重实验室规划方向分类语音语言处理
是否有论文关联数据集需要存交
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/51968
专题毕业生_博士学位论文
通讯作者林海涛
推荐引用方式
GB/T 7714
林海涛. 面向对话文本的自动摘要关键技术研究[D],2023.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
面向对话文本的自动摘要关键技术研究-林海(6631KB)学位论文 限制开放CC BY-NC-SA
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[林海涛]的文章
百度学术
百度学术中相似的文章
[林海涛]的文章
必应学术
必应学术中相似的文章
[林海涛]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。