Knowledge Commons of Institute of Automation,CAS
DMRM: A Dual-Channel Multi-Hop Reasoning Model for Visual Dialog | |
Feilong Chen1,2,3,4![]() ![]() ![]() | |
2020 | |
会议名称 | the 34th AAAI Conference on Artificial Intelligence |
会议日期 | 2020.2 |
会议地点 | 美国纽约 |
摘要 | Visual Dialog is a vision-language task that requires an AI agent to engage in a conversation with humans grounded in an image. It remains a challenging task since it requires the agent to fully understand a given question before making an appropriate response not only from the textual dialog history, but also from the visually-grounded information. While previous models typically leverage single-hop reasoning or single-channel reasoning to deal with this complex multimodal reasoning task, which is intuitively insufficient. In this paper, we thus propose a novel and more powerful Dual-channel Multi-hop Reasoning Model for Visual Dialog, named DMRM. DMRM synchronously captures information from the dialog history and the image to enrich the semantic representation of the question by exploiting dual-channel reasoning. Specifically, DMRM maintains a dual channel to obtain the question-and history-aware image features and the question-and image-aware dialog history features by a mulit-hop reasoning process in each channel. Additionally, we also design an effective multimodal attention to further enhance the decoder to generate more accurate responses. Experimental results on the VisDial v0. 9 and v1. 0 datasets demonstrate that the proposed model is effective and outperforms compared models by a significant margin. |
七大方向——子方向分类 | 自然语言处理 |
国重实验室规划方向分类 | 人机混合智能 |
是否有论文关联数据集需要存交 | 否 |
文献类型 | 会议论文 |
条目标识符 | http://ir.ia.ac.cn/handle/173211/51918 |
专题 | 复杂系统认知与决策实验室_听觉模型与认知计算 |
通讯作者 | Jiaming Xu |
作者单位 | 1.Institute of Automation, Chinese Academy of Sciences (CASIA) 2.Pattern Recognition Center, WeChat AI, Tencent Inc., China 3.Research Center for Brain-inspired Intelligence, CASIA 4.University of Chinese Academy of Sciences 5.Center for Excellence in Brain Science and Intelligence Technology, CAS. China |
第一作者单位 | 中国科学院自动化研究所 |
通讯作者单位 | 中国科学院自动化研究所 |
推荐引用方式 GB/T 7714 | Feilong Chen,Fandong Meng,Jiaming Xu,et al. DMRM: A Dual-Channel Multi-Hop Reasoning Model for Visual Dialog[C],2020. |
条目包含的文件 | 下载所有文件 | |||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | ||
aaai.pdf(3052KB) | 会议论文 | 开放获取 | CC BY-NC-SA | 浏览 下载 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论