DMRM: A Dual-Channel Multi-Hop Reasoning Model for Visual Dialog

	DMRM: A Dual-Channel Multi-Hop Reasoning Model for Visual Dialog
	Feilong Chen1,2,3,4 ; Fandong Meng 2; Jiaming Xu1,3 ; Peng Li 2; Bo Xu1,3,4,5 ; Jie Zhou 2
	2020
会议名称	the 34th AAAI Conference on Artificial Intelligence
会议日期	2020.2
会议地点	美国纽约
摘要	Visual Dialog is a vision-language task that requires an AI agent to engage in a conversation with humans grounded in an image. It remains a challenging task since it requires the agent to fully understand a given question before making an appropriate response not only from the textual dialog history, but also from the visually-grounded information. While previous models typically leverage single-hop reasoning or single-channel reasoning to deal with this complex multimodal reasoning task, which is intuitively insufficient. In this paper, we thus propose a novel and more powerful Dual-channel Multi-hop Reasoning Model for Visual Dialog, named DMRM. DMRM synchronously captures information from the dialog history and the image to enrich the semantic representation of the question by exploiting dual-channel reasoning. Specifically, DMRM maintains a dual channel to obtain the question-and history-aware image features and the question-and image-aware dialog history features by a mulit-hop reasoning process in each channel. Additionally, we also design an effective multimodal attention to further enhance the decoder to generate more accurate responses. Experimental results on the VisDial v0. 9 and v1. 0 datasets demonstrate that the proposed model is effective and outperforms compared models by a significant margin.
七大方向——子方向分类	自然语言处理
国重实验室规划方向分类	人机混合智能
是否有论文关联数据集需要存交	否
文献类型	会议论文
条目标识符	http://ir.ia.ac.cn/handle/173211/51918
专题	复杂系统认知与决策实验室_听觉模型与认知计算
通讯作者	Jiaming Xu
作者单位	1.Institute of Automation, Chinese Academy of Sciences (CASIA) 2.Pattern Recognition Center, WeChat AI, Tencent Inc., China 3.Research Center for Brain-inspired Intelligence, CASIA 4.University of Chinese Academy of Sciences 5.Center for Excellence in Brain Science and Intelligence Technology, CAS. China
第一作者单位	中国科学院自动化研究所
通讯作者单位	中国科学院自动化研究所
推荐引用方式 GB/T 7714	Feilong Chen,Fandong Meng,Jiaming Xu,et al. DMRM: A Dual-Channel Multi-Hop Reasoning Model for Visual Dialog[C],2020.

条目包含的文件		下载所有文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
aaai.pdf（3052KB）	会议论文		开放获取	CC BY-NC-SA	浏览下载