CASIA OpenIR  > 毕业生  > 博士学位论文
多源信息融合的自动摘要方法研究
朱军楠
2020-05-28
页数122
学位类型博士
中文摘要

随着全球信息交流的加速与多媒体技术的快速发展,互联网上的多源信息日益增多。相比单源信息,多源信息可以提供更加丰富的内容。例如,图文并茂的场景能够比纯文本带来更直观的冲击力,同类主题不同语言的新闻报道可以提供多角度的信息。因此,多模态、多语言信息能够形成信息互补和信息增强,从而有利于生成更为精准的摘要。然而,现有的自动摘要研究通常聚焦于单语言文本,忽略了多模态和多语言信息之间的关联性,从而造成一定程度的信息丢失,进而导致现有方法无法在多源信息场景下获得高质量的摘要。基于此,本文聚焦多源信息融合的自动摘要方法研究,力图整合多模态、多语言信息以提升摘要质量。

论文的主要贡献和创新归纳如下:

1、提出了一种兼顾模态内重要性和模态间相关性的图文式摘要评价方法

针对已有的文摘质量自动评估方法无法对多模态自动摘要中图文并茂场景下摘要进行质量评估的问题,论文提出了一种兼顾模态内重要性和模态间相关性的图文式摘要评价方法,该方法将评价问题划分为文本重要性、图片重要性和图文相关性三个方面,并通过线性回归的方式将这三方面的度量分数进行加权从而得到最终的摘要整体度量得分。实验表明,与现有的自动摘要评价方法相比,论文提出的评价方法与人工评价之间的相关度更高,能够更好地用于评价图文式摘要。

2、提出了一种融合多模态注意力机制的图文式摘要生成方法

针对目前自动摘要方法只能有效利用文本信息生成单一模态摘要的问题,论文提出了一种融合多模态注意力机制的图文式摘要生成方法。该方法将图片与文本的语义信息同时编码,能够更好地捕捉图片和文本的对齐关系,从而利用不同模态的语义表征增强关键信息,生成更精准的图文式摘要。实验表明,在自动构建的数据集上,论文所提出的图文式摘要生成方法在自动评价指标和人工评价方面均优于已有的文本摘要方法。该方法具有良好的通用性,既适用于图片和文本没有显式对齐关系的情况,也能迁移至其它的应用场景和任务。

3、提出了一种生成式的端到端多语言自动摘要方法

针对管道式多语言自动摘要中误差传播问题,论文提出了一种生成式的端到端多语言自动摘要方法。该方法通过融合跨语言自动摘要中的“翻译”模式和单语言自动摘要中的“复制”模式,实现了在零资源情况下训练数据的构建与多语言混合输入的语义融合。实验表明,相比于传统的管道式多语言自动摘要方法,该方法能够显著提升摘要质量。

英文摘要

With the acceleration of global information exchange and the rapid development of multimedia technology, multi-source information on the Internet is increasing day by day. Compared with single-source information, multi-source information can provide richer content. For example, scenes with pictures and texts can bring more visual impact than pure texts, and news reports on the same topic in different languages can provide more perspectives. However, existing research on automatic summarization usually focuses on monolingual text, ignoring the correlation between multimodal and multilingual information, resulting in a certain degree of information loss, which further leads to the existing methods unable to obtain high-quality summaries in multi-source information scenarios. Based on this, this paper focuses on the research of automatic summarization methods based on multi-source information fusion, aiming to integrate multimodal and multilingual information to improve summarization.

The main contributions of this paper are as follows.

1. Incorporating both intra-modal saliency and inter-modal relevance into automatic evaluation method for pictorial summaries

Towards the problem that existing automatic summarization evaluation methods cannot evaluate the pictorial summaries, this paper proposes a pictorial summary evaluation method for the first time, which takes both intra-modal saliency and inter-modal relevance into account. This method divides the evaluation problem into three aspects: the text saliency, the image saliency, and the text-image relevance. The measurement scores of these three aspects are weighted by linear regression to obtain the final overall measurement score of the pictorial summary. Experiments have shown that, compared with the existing automatic summarization evaluation methods, the correlation between the proposed evaluation method and human evaluation is higher, which illustrates that the proposed evaluation method can be better applied to evaluate pictorial summaries.

2. Incorporating multimodal attention mechanism into Generation Method of Pictorial Summaries

Towards the problem that the current automatic summarization methods can only effectively generate a single-modal summary by using textual information, this paper proposes a pictorial summary generation method which incorporates multi-modal attention mechanism. The method encodes the semantic information of both the image and the text simultaneously, and can better capture the alignment between the image and the text, thus enhancing the key information by using the semantic representation of different modalities and generating a more accurate pictorial summary. The experiments have shown that on the automatically constructed dataset, the method proposed in this paper can outperform the traditional text summarization methods in terms of both automatic evaluation method and human evaluation. The proposed method has good generality, and can be applied to scenes with no explicit alignment between images and texts, and can also be migrated to other application scenes and tasks.

3. Proposing an abstractive end-to-end multilingual summarization method

Towards the problem of error propagation in pipeline-based multilingual summarization methods, this paper proposes an abstractive end-to-end multilingual summarization method. The method integrates both the "translating" mode in the cross-lingual summarization and the "copying" mode in the monolingual summarization, achieving the construction of training data and semantic fusion of multilingual mixed input under the zero-shot scenario. Experiments have shown that compared with the traditional pipeline-based multilingual summarization methods, the method proposed in this paper can significantly improve the quality of summaries.

关键词自动摘要,多语言,多模态,评价方法
语种中文
七大方向——子方向分类自然语言处理
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/39081
专题毕业生_博士学位论文
推荐引用方式
GB/T 7714
朱军楠. 多源信息融合的自动摘要方法研究[D]. 远程答辩. 中国科学院大学,2020.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
Thesis.pdf(5794KB)学位论文 限制开放CC BY-NC-SA
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[朱军楠]的文章
百度学术
百度学术中相似的文章
[朱军楠]的文章
必应学术
必应学术中相似的文章
[朱军楠]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。