Visual Question Answering With Dense Inter- and Intra-Modality Interactions
Liu, Fei1,2; Liu, Jing1,2; Fang, Zhiwei1,2; Hong, Richang3; Lu, Hanqing1,2
发表期刊IEEE TRANSACTIONS ON MULTIMEDIA
ISSN1520-9210
2021
卷号23页码:3518-3529
摘要

Learning effective interactions between multi-modal features is at the heart of visual question answering (VQA). A common defect of the existing VQA approaches is that they only consider a very limited amount of inter-modality interactions, which may be not enough to model latent complex image-question relations that are necessary for accurately answering questions. Besides, most methods neglect the modeling of the intra-modality interactions that is also important to VQA. In this work, we propose a novel DenIII framework for modeling dense inter-, and intra-modality interactions. It densely connects all pairwise layers of the network via the proposed Inter-, and Intra-modality Attention Connectors, capturing fine-grained interplay across all hierarchical levels. The Inter-modality Attention Connector efficiently connects the multi-modality features at any two layers with bidirectional attention, capturing the inter-modality interactions. While the Intra-modality Attention Connector connects the features of the same modality with unidirectional attention, and models the intra-modality interactions. Extensive ablation studies, and visualizations validate the effectiveness of our method, and DenIII achieves state-of-the-art or competitive performance on three publicly available datasets.

关键词Visualization Knowledge discovery Connectors Encoding Task analysis Image coding Stacking Visual question answering attention dense interactions
DOI10.1109/TMM.2020.3026892
收录类别SCI
语种英语
资助项目Beijing Natural Science Foundation[4192059] ; Beijing Natural Science Foundation[JQ20022] ; National Natural Science Foundation of China[61922086] ; National Natural Science Foundation of China[61872366] ; National Natural Science Foundation of China[61872364]
项目资助者Beijing Natural Science Foundation ; National Natural Science Foundation of China
WOS研究方向Computer Science ; Telecommunications
WOS类目Computer Science, Information Systems ; Computer Science, Software Engineering ; Telecommunications
WOS记录号WOS:000709093100007
出版者IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
七大方向——子方向分类多模态智能
引用统计
被引频次:23[WOS]   [WOS记录]     [WOS相关记录]
文献类型期刊论文
条目标识符http://ir.ia.ac.cn/handle/173211/46268
专题紫东太初大模型研究中心_图像与视频分析
通讯作者Liu, Jing
作者单位1.Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
2.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
3.Hefei Univ Technol, Sch Comp & Informat, Hefei 230000, Anhui, Peoples R China
第一作者单位模式识别国家重点实验室
通讯作者单位模式识别国家重点实验室
推荐引用方式
GB/T 7714
Liu, Fei,Liu, Jing,Fang, Zhiwei,et al. Visual Question Answering With Dense Inter- and Intra-Modality Interactions[J]. IEEE TRANSACTIONS ON MULTIMEDIA,2021,23:3518-3529.
APA Liu, Fei,Liu, Jing,Fang, Zhiwei,Hong, Richang,&Lu, Hanqing.(2021).Visual Question Answering With Dense Inter- and Intra-Modality Interactions.IEEE TRANSACTIONS ON MULTIMEDIA,23,3518-3529.
MLA Liu, Fei,et al."Visual Question Answering With Dense Inter- and Intra-Modality Interactions".IEEE TRANSACTIONS ON MULTIMEDIA 23(2021):3518-3529.
条目包含的文件 下载所有文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
Visual_Question_Answ(2891KB)期刊论文作者接受稿开放获取CC BY-NC-SA浏览 下载
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Liu, Fei]的文章
[Liu, Jing]的文章
[Fang, Zhiwei]的文章
百度学术
百度学术中相似的文章
[Liu, Fei]的文章
[Liu, Jing]的文章
[Fang, Zhiwei]的文章
必应学术
必应学术中相似的文章
[Liu, Fei]的文章
[Liu, Jing]的文章
[Fang, Zhiwei]的文章
相关权益政策
暂无数据
收藏/分享
文件名: Visual_Question_Answering_With_Dense_Inter-_and_Intra-Modality_Interactions.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。