融合图像与文本的多模态情感分析方法研究
徐楠
Subtype博士
Thesis Advisor毛文吉
2020-05-30
Degree Grantor中国科学院大学
Place of Conferral中国科学院自动化研究所
Degree Name工学博士
Degree Discipline计算机应用技术
Keyword多模态情感分析 图像语义 信息交互 属性级 多模态数据增强
Abstract

随着互联网的发展,社交媒体已经成为用户发表个人观点、分享信息及表达情感的重要平台。情感分析作为社交媒体分析的一个重要研究课题,是舆情监测、口碑营销、商品推荐等诸多实际问题的基础,在公共安全、商业等领域具有重要的研究意义和应用价值。

随着多模态的社交媒体平台如抖音、Instagram等的普及,社交媒体数据形式呈现多模态趋势(如文本、图像等),提升了用户获取信息的效率。但是,由于多模态数据的复杂性和异构性,使计算机在多模态数据内容理解上更加困难,为传统的基于文本的情感分析任务带来了新的挑战。本论文研究 融合图像与文本的多模态情感分析方法,从多模态数据表征层、特征融合层、属性层及数据处理层多个角度,聚焦多模态数据之间的语义关联、信息交互、细粒度的属性级情感建模及多模态数据增强等研究问题,以增强计算机对多模态数据的感知能力,提升多模态情感分析模型的性能。

本论文的主要研究贡献包括:

  1. 针对已有多模态情感分析方法在多模态数据表征层忽略了图像中包含的隐含语义信息的问题,本论文分别提出了基于图像描述和基于图像语义特征的多模态情感分析模型。前者利用图像描述生成器将图像翻译成一段描述性的文本来增强图像语义理解,并采用层次化语义注意力网络聚焦图像描述中的关键词及文本中的关键词和关键句;后者通过显式地从图像中抽取场景和实体特征来增强图像语义理解,并基于图像语义特征来引导注意力机制,以聚焦文本中与图像相关的关键信息。

  2. 针对已有多模态情感分析方法在多模态特征融合层忽略了各模态数据之间存在信息互补和增强作用的问题,本论文提出了基于信息交互的多模态情感分析模型。该模型在多模态情感分析任务中首次建模了图像和文本之间的相互作用,通过设计交互记忆机制来实现图像检索文本关键词和文本定位图像关键区域,以捕获图像和文本中的情感关键内容,为多模态情感分析任务提供可解释依据。

  3. 针对多模态情感分析领域缺乏面向属性级情感分析相关研究的问题,本论文首次提出了属性级多模态情感分析这一新的研究任务,并建立了首个基于属性级信息交互的多模态情感分析模型。该模型建模了属性到文本、属性到图像以及图像和文本之间复杂的信息交互过程,通过设计多模态记忆网络和多交互注意力机制,以聚焦图像和文本中与给定属性相关的情感关键内容。为了验证模型的有效性,还发布了一个属性级多模态情感分析标准数据集,为该新任务提供了重要的数据资源。

  4. 针对多模态数据分析任务中有标注样本稀少且已有数据增强方法仅面向单模态数据领域的问题,本论文首次提出了多模态数据增强任务,并建立了基于跨模态匹配的多模态数据增强框架。该框架采用预训练的跨模态匹配器从现有的有标注单模态数据集中自动合成多模态数据,并将合成数据经过筛选和过滤后用于多模态分类器增强。

Other Abstract

With the development of the Internet, social media has become an important platform for users to publish personal opinions, share information and express sentiments. As an important research field of social media analysis, sentiment analysis is the basis of practical problems such as public opinion monitoring, word-of-mouth marketing, commodity recommendation, etc. It has high research and application values in the fields of public security, business and so on.

With the widespread of multimodal social media platforms such as Tiktok and Instagram, the multimodal form of social media data, including text, image, etc., has improved the efficiency of users' access to information. However, it is more difficult for computer to understand the content of multimodal data due to the complexity and heterogeneity of multimodal data, which brings new challenges to the traditional text-based sentiment analysis task. This thesis focuses on the problem of multimodal sentiment analysis for image and text data. This thesis studies the multimodal semantic association, information interaction, fine-grained aspect based sentiment modeling and multimodal data augmentation at multimodal data representation, feature fusion, aspect level and data processing. This thesis aims to enhance the perception ability of the computer to multimodal data and improve the performance of multimodal sentiment analysis model.

The main research contributions of this thesis are as follows:

  1. Existing multimodal sentiment analysis methods ignore the implicit semantic information contained in the image at multimodal data representation level. This thesis proposes a multimodal sentiment analysis model based on image caption and image semantic features respectively. The former uses the image caption generator to translate the image into a descriptive text to enhance the visual semantic understanding, and uses a hierarchical semantic attention network to focus on the keywords of image caption, keywords and key sentences of the text; the latter extracts the explicit visual features, scene and object, to enhance the visual semantic understanding, and propose the visual semantic feature guided attention mechanism to focus on key textual information which is related to the image.

  2. Existing multimodal sentiment analysis methods ignore the information complementation and enhancement of multimodal data at multimodal feature fusion level. This thesis proposes a multimodal sentiment analysis model based on information interaction. It is the first time to model the interaction between image and text in multimodal sentiment analysis task. To capture the key sentiment content in image and text, this model designs the co-memory mechanism to utilize image to retrieve textual keywords, and text to find visual key areas. It can also provide the interpretable evidence for multimodal sentiment analysis task.

  3. Existing multimodal sentiment analysis task has not yet been studied at aspect level. This thesis proposes a new research task, aspect level multimodal sentiment analysis, and designs the first multimodal sentiment analysis model based on aspect-level information interaction. It captures the complex information interaction process, including aspect to text, aspect to image, and the association between image and text. By constructing the multimodal memory network and multi-interactive attention mechanism, this model focus on the key sentiment content in image and text, which is related to the given aspect. To demonstrate the effectiveness of the proposed model, this thesis also builds a standard dataset for aspect based multimodal sentiment analysis task, which provides important data resource for this new task.

  4. The scarcity of labeled multimodal data is a problem in multimodal data analysis tasks, and existing data augmentation methods can only be applied to unimodal data but not multimode data. This thesis proposes a new research task, multimodal data augmentation task, and establishes a multimodal data augmentation framework based on cross-modality matching. This framework uses a pre-trained cross-modality matcher to automatically synthesize multimodal data from existing labeled unimodal datasets. After filtering and selection, the synthetic multimodal data is used for the enhancement of multimodal classifier.

Pages126
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/39149
Collection复杂系统管理与控制国家重点实验室_互联网大数据与信息安全
Corresponding Author徐楠
Recommended Citation
GB/T 7714
徐楠. 融合图像与文本的多模态情感分析方法研究[D]. 中国科学院自动化研究所. 中国科学院大学,2020.
Files in This Item:
File Name/Size DocType Version Access License
融合图像与文本的多模态情感分析方法研究.(4227KB)学位论文 开放获取CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[徐楠]'s Articles
Baidu academic
Similar articles in Baidu academic
[徐楠]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[徐楠]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.