Knowledge Commons of Institute of Automation,CAS
So Many Heads, So Many Wits: Multimodal Graph Reasoning for Text-Based Visual Question Answering | |
Zheng, Wenbo1,2; Yan, Lan3,4![]() ![]() | |
发表期刊 | IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS
![]() |
ISSN | 2168-2216 |
2023-10-17 | |
页码 | 12 |
通讯作者 | Zheng, Wenbo(zwb2022@whut.edu.cn) |
摘要 | While texts related to images convey fundamental messages for scene understanding and reasoning, text-based visual question answering tasks concentrate on visual questions that require reading texts from images. However, most current methods add multimodal features that are independently extracted from a given image into a reasoning model without considering their inter-and intra-relationships according to three modalities (i.e., scene texts, questions, and images). To this end, we propose a novel text-based visual question answering model, multimodal graph reasoning. Our model first extracts intramodality relationships by taking the representations from identical modalities as semantic graphs. Then, we present graph multihead self-attention, which boosts each graph representation through graph-by-graph aggregation to capture the intermodality relationship. It is a case of "so many heads, so many wits" in the sense that as more semantic graphs are involved in this process, each graph representation becomes more effective. Finally, these representations are reprojected, and we perform answer prediction with their outputs. The experimental results demonstrate that our approach realizes substantially better performance compared with other state-of-the-art models. |
关键词 | Graph attention graph reasoning multimodal graph self-attention text-based visual question answering |
DOI | 10.1109/TSMC.2023.3319964 |
关键词[WOS] | ATTENTIONS ; LANGUAGE ; VISION |
收录类别 | SCI |
语种 | 英语 |
资助项目 | Natural Science Foundation of China[62303361] ; Natural Science Foundation of China[62302161] ; Natural Science Foundation of China[U1811463] ; Hainan Provincial Natural Science Foundation of China[623QN266] ; Fundamental Research Funds for the Central Universities[233110002] ; China National Postdoctoral Program for Innovative Talents[BX20230114] ; National Key Research and Development Program of China[2018AAA0101502] ; Natural Science Foundation of China[62303361] ; Natural Science Foundation of China[62302161] ; Natural Science Foundation of China[U1811463] ; Hainan Provincial Natural Science Foundation of China[623QN266] ; Fundamental Research Funds for the Central Universities[233110002] ; China National Postdoctoral Program for Innovative Talents[BX20230114] ; National Key Research and Development Program of China[2018AAA0101502] |
项目资助者 | Natural Science Foundation of China ; Hainan Provincial Natural Science Foundation of China ; Fundamental Research Funds for the Central Universities ; China National Postdoctoral Program for Innovative Talents ; National Key Research and Development Program of China ; Natural Science Foundation of China ; Hainan Provincial Natural Science Foundation of China ; Fundamental Research Funds for the Central Universities ; China National Postdoctoral Program for Innovative Talents ; National Key Research and Development Program of China |
WOS研究方向 | Automation & Control Systems ; Computer Science |
WOS类目 | Automation & Control Systems ; Computer Science, Cybernetics |
WOS记录号 | WOS:001090709300001 |
出版者 | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://ir.ia.ac.cn/handle/173211/54317 |
专题 | 多模态人工智能系统全国重点实验室 多模态人工智能系统全国重点实验室_平行智能技术与系统团队 |
通讯作者 | Zheng, Wenbo |
作者单位 | 1.Wuhan Univ Technol, Sch Comp Sci & Artificial Intelligence, Wuhan 430070, Peoples R China 2.Wuhan Univ Technol, Sanya Sci & Educ Innovat Pk, Sanya 572000, Peoples R China 3.Hunan Univ, Coll Comp Sci & Engn, Changsha 410082, Hunan, Peoples R China 4.Natl Supercomp Ctr, Changsha 410082, Hunan, Peoples R China 5.Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China |
推荐引用方式 GB/T 7714 | Zheng, Wenbo,Yan, Lan,Wang, Fei-Yue. So Many Heads, So Many Wits: Multimodal Graph Reasoning for Text-Based Visual Question Answering[J]. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS,2023:12. |
APA | Zheng, Wenbo,Yan, Lan,&Wang, Fei-Yue.(2023).So Many Heads, So Many Wits: Multimodal Graph Reasoning for Text-Based Visual Question Answering.IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS,12. |
MLA | Zheng, Wenbo,et al."So Many Heads, So Many Wits: Multimodal Graph Reasoning for Text-Based Visual Question Answering".IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS (2023):12. |
条目包含的文件 | 条目无相关文件。 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论