VLP2MSA: Expanding vision-language pre-training to multimodal sentiment analysis
Yi, Guofeng1; Fan, Cunhang1; Zhu, Kang1; Lv, Zhao1; Liang, Shan5; Wen, Zhengqi4; Pei, Guanxiong3; Li, Taihao3; Tao, Jianhua2
发表期刊KNOWLEDGE-BASED SYSTEMS
ISSN0950-7051
2024-01-11
卷号283页码:9
通讯作者Fan, Cunhang(cunhang.fan@ahu.edu.cn)
摘要Large-scale vision-and-language representation learning has improved performance on various joint vision language downstream tasks. In this work, our objective is to extend it effectively to multimodal sentiment analysis tasks and address two urgent challenges in this field: (1) the low contribution of the visual modality (2) the design of an effective multimodal fusion architecture. To overcome the imbalance between the visual and textual modalities, we propose an inter-frame hybrid transformer, which extends the recent CLIP and Timesformer architectures. This module extracts spatiotemporal features from sparsely sampled video frames, not only focusing on facial expressions but also capturing body movement information, providing a more comprehensive visual representation compared to the traditional direct use of pre-extracted facial information. Additionally, we tackle the challenge of modality heterogeneity in the fusion architecture by introducing a new scheme that prompts and aligns the video and text information before fusing them. Specifically, We generate discriminative text prompts based on the video content information to enhance the text representation and align the unimodal video-text features using a video-text contrastive loss. Our proposed end-to-end trainable model demonstrates state-of-the-art performance on three widely-used datasets using only two modalities: MOSI, MOSEI, and CH-SIMS. These experimental results validate the effectiveness of our approach in improving multimodal sentiment analysis tasks.
关键词Multimodal sentiment analysis Vision-language Multimodal fusion
DOI10.1016/j.knosys.2023.111136
收录类别SCI
语种英语
资助项目STI 2030-Major Projects[2021ZD0201500] ; National Natural Science Foundation of China (NSFC)[62201002] ; National Natural Science Foundation of China (NSFC)[61972437] ; Excellent Youth Foundation of Anhui Scientific Committee[2208085J05] ; Special Fund for Key Program of Science and Technology of Anhui Province[202203a07020008] ; Open Research Projects of Zhejiang Lab[2021KH0 AB06] ; Open Projects Program of National Laboratory of Pattern Recognition[202200014]
项目资助者STI 2030-Major Projects ; National Natural Science Foundation of China (NSFC) ; Excellent Youth Foundation of Anhui Scientific Committee ; Special Fund for Key Program of Science and Technology of Anhui Province ; Open Research Projects of Zhejiang Lab ; Open Projects Program of National Laboratory of Pattern Recognition
WOS研究方向Computer Science
WOS类目Computer Science, Artificial Intelligence
WOS记录号WOS:001108284900001
出版者ELSEVIER
引用统计
被引频次:1[WOS]   [WOS记录]     [WOS相关记录]
文献类型期刊论文
条目标识符http://ir.ia.ac.cn/handle/173211/55123
专题多模态人工智能系统全国重点实验室_智能交互
通讯作者Fan, Cunhang
作者单位1.Anhui Univ, Sch Comp Sci & Technol, Anhui Prov Key Lab Multimodal Cognit Comp, Hefei, Peoples R China
2.Tsinghua Univ, Dept Automat, Beijing, Peoples R China
3.Zhejiang Lab, Inst Artificial Intelligence, Hangzhou, Peoples R China
4.Qiyuan Lab, Beijing, Peoples R China
5.Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
推荐引用方式
GB/T 7714
Yi, Guofeng,Fan, Cunhang,Zhu, Kang,et al. VLP2MSA: Expanding vision-language pre-training to multimodal sentiment analysis[J]. KNOWLEDGE-BASED SYSTEMS,2024,283:9.
APA Yi, Guofeng.,Fan, Cunhang.,Zhu, Kang.,Lv, Zhao.,Liang, Shan.,...&Tao, Jianhua.(2024).VLP2MSA: Expanding vision-language pre-training to multimodal sentiment analysis.KNOWLEDGE-BASED SYSTEMS,283,9.
MLA Yi, Guofeng,et al."VLP2MSA: Expanding vision-language pre-training to multimodal sentiment analysis".KNOWLEDGE-BASED SYSTEMS 283(2024):9.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Yi, Guofeng]的文章
[Fan, Cunhang]的文章
[Zhu, Kang]的文章
百度学术
百度学术中相似的文章
[Yi, Guofeng]的文章
[Fan, Cunhang]的文章
[Zhu, Kang]的文章
必应学术
必应学术中相似的文章
[Yi, Guofeng]的文章
[Fan, Cunhang]的文章
[Zhu, Kang]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。