Knowledge Commons of Institute of Automation,CAS
Erasing-based Attention Learning for Visual Question Answering | |
Liu, Fei1,2![]() ![]() ![]() | |
2019-10 | |
会议名称 | Proceedings of the 27th ACM International Conference on Multimedia |
会议日期 | 2019-10 |
会议地点 | Nice, France |
出版者 | ACM |
摘要 | Attention learning for visual question answering remains a challenging task, where most existing methods treat the attention and the non-attention parts in isolation. In this paper, we propose to enforce the correlation between the attention and the nonattention parts as a constraint for attention learning. We first adopt an attention-guided erasing scheme to obtain the attention and the non-attention parts respectively, and then learn to separate the attention and the non-attention parts by an appropriate distance margin in a feature embedding space. Furthermore, we associate a typical classification loss with the above distance constraint to learn a more discriminative attention map for answer prediction. The proposed approach does not introduce extra model parameters or inference complexity, and can be combined with any attention-based models. Extensive ablation experiments validate the effectiveness of our method, and new state-of-the-art or competitive results on four publicly available datasets are achieved. |
语种 | 英语 |
文献类型 | 会议论文 |
条目标识符 | http://ir.ia.ac.cn/handle/173211/48673 |
专题 | 紫东太初大模型研究中心_图像与视频分析 |
通讯作者 | Liu, Jing |
作者单位 | 1.National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences 2.University of Chinese Academy of Sciences 3.School of Computer and Information, Hefei University of Technology |
第一作者单位 | 模式识别国家重点实验室 |
通讯作者单位 | 模式识别国家重点实验室 |
推荐引用方式 GB/T 7714 | Liu, Fei,Liu, Jing,Hong, Richang,et al. Erasing-based Attention Learning for Visual Question Answering[C]:ACM,2019. |
条目包含的文件 | 下载所有文件 | |||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | ||
3343031.3350993.pdf(2319KB) | 会议论文 | 开放获取 | CC BY-NC-SA | 浏览 下载 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论