DMRM: A Dual-Channel Multi-Hop Reasoning Model for Visual Dialog
Feilong Chen1,2,3,4; Fandong Meng2; Jiaming Xu1,3; Peng Li2; Bo Xu1,3,4,5; Jie Zhou2
2020
会议名称the 34th AAAI Conference on Artificial Intelligence
会议日期2020.2
会议地点美国纽约
摘要

Visual Dialog is a vision-language task that requires an AI agent to engage in a conversation with humans grounded in an image. It remains a challenging task since it requires the agent to fully understand a given question before making an appropriate response not only from the textual dialog history, but also from the visually-grounded information. While previous models typically leverage single-hop reasoning or single-channel reasoning to deal with this complex multimodal reasoning task, which is intuitively insufficient. In this paper, we thus propose a novel and more powerful Dual-channel Multi-hop Reasoning Model for Visual Dialog, named DMRM. DMRM synchronously captures information from the dialog history and the image to enrich the semantic representation of the question by exploiting dual-channel reasoning. Specifically, DMRM maintains a dual channel to obtain the question-and history-aware image features and the question-and image-aware dialog history features by a mulit-hop reasoning process in each channel. Additionally, we also design an effective multimodal attention to further enhance the decoder to generate more accurate responses. Experimental results on the VisDial v0. 9 and v1. 0 datasets demonstrate that the proposed model is effective and outperforms compared models by a significant margin.

七大方向——子方向分类自然语言处理
国重实验室规划方向分类人机混合智能
是否有论文关联数据集需要存交
文献类型会议论文
条目标识符http://ir.ia.ac.cn/handle/173211/51918
专题复杂系统认知与决策实验室_听觉模型与认知计算
通讯作者Jiaming Xu
作者单位1.Institute of Automation, Chinese Academy of Sciences (CASIA)
2.Pattern Recognition Center, WeChat AI, Tencent Inc., China
3.Research Center for Brain-inspired Intelligence, CASIA
4.University of Chinese Academy of Sciences
5.Center for Excellence in Brain Science and Intelligence Technology, CAS. China
第一作者单位中国科学院自动化研究所
通讯作者单位中国科学院自动化研究所
推荐引用方式
GB/T 7714
Feilong Chen,Fandong Meng,Jiaming Xu,et al. DMRM: A Dual-Channel Multi-Hop Reasoning Model for Visual Dialog[C],2020.
条目包含的文件 下载所有文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
aaai.pdf(3052KB)会议论文 开放获取CC BY-NC-SA浏览 下载
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Feilong Chen]的文章
[Fandong Meng]的文章
[Jiaming Xu]的文章
百度学术
百度学术中相似的文章
[Feilong Chen]的文章
[Fandong Meng]的文章
[Jiaming Xu]的文章
必应学术
必应学术中相似的文章
[Feilong Chen]的文章
[Fandong Meng]的文章
[Jiaming Xu]的文章
相关权益政策
暂无数据
收藏/分享
文件名: aaai.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。