CASIA OpenIR  > 模式识别国家重点实验室  > 图像与视频分析
Improving visual question answering using dropout and enhanced question encoder
Fang, Zhiwei1,2; Liu, Jing1; Li, Yong3; Qiao, Yanyuan2; Lu, Hanqing1
Source PublicationPATTERN RECOGNITION
ISSN0031-3203
2019-06-01
Volume90Issue:1Pages:404-414
Abstract

Using dropout in Visual Question Answering (VQA) is a common practice to prevent overfitting. However, the current way to use dropout in multi-path networks may cause two problems: the co-adaptations of neurons and the explosion of output variance. In this paper, we propose coherent dropout and siamese dropout mechanism to solve the two problems, respectively. Specifically, in coherent dropout, the relevant dropout layers in multiple paths are forced to work coherently to maximize the ability of preventing neuron co-adaptations. We show that the coherent dropout is simple in implementation but very effective to overcome overfitting. As for the explosion of output variance, we develop a siamese dropout mechanism to explicitly minimize the difference between the two output vectors produced from the same input data during training phase. Such mechanism can reduce the gap between training and inference phases and make the VQA model more robust. With the help of the two techniques, we further design an enhanced question encoder called Multi-path Stacked Residual RNNs which is deeper and wider and more powerful than current shallow question encoder. Extensive experiments are conducted to verify the effectiveness of coherent dropout, siamese dropout and the enhanced question encoder. And the results show that our methods can bring clear improvements to the state-of-the-art VQA models on VQA-vl and VQA-v2 datasets. (C) 2019 Elsevier Ltd. All rights reserved.

KeywordVisual question answering Coherent dropout Siamese dropout Enhanced question encoder
DOI10.1016/j.patcog.2019.01.038
WOS KeywordNETWORKS
Indexed BySCI
Language英语
Funding ProjectNational Natural Science Foundation of China[61872366] ; Beijing Municipal Natural Science Foundation[4192059]
WOS Research AreaComputer Science ; Engineering
WOS SubjectComputer Science, Artificial Intelligence ; Engineering, Electrical & Electronic
WOS IDWOS:000463130400033
PublisherELSEVIER SCI LTD
Citation statistics
Document Type期刊论文
Identifierhttp://ir.ia.ac.cn/handle/173211/23484
Collection模式识别国家重点实验室_图像与视频分析
中国科学院自动化研究所
模式识别国家重点实验室
Corresponding AuthorLiu, Jing
Affiliation1.Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, 95 Zhongguancun East Rd, Beijing 100190, Peoples R China
2.Univ Chinese Acad Sci, Beijing, Peoples R China
3.JD Com, Business Growth BU, Intelligent Advertising Lab, Beijing, Peoples R China
First Author AffilicationChinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
Corresponding Author AffilicationChinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
Recommended Citation
GB/T 7714
Fang, Zhiwei,Liu, Jing,Li, Yong,et al. Improving visual question answering using dropout and enhanced question encoder[J]. PATTERN RECOGNITION,2019,90(1):404-414.
APA Fang, Zhiwei,Liu, Jing,Li, Yong,Qiao, Yanyuan,&Lu, Hanqing.(2019).Improving visual question answering using dropout and enhanced question encoder.PATTERN RECOGNITION,90(1),404-414.
MLA Fang, Zhiwei,et al."Improving visual question answering using dropout and enhanced question encoder".PATTERN RECOGNITION 90.1(2019):404-414.
Files in This Item: Download All
File Name/Size DocType Version Access License
Improving visual que(1624KB)期刊论文作者接受稿开放获取CC BY-NC-SAView Download
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Fang, Zhiwei]'s Articles
[Liu, Jing]'s Articles
[Li, Yong]'s Articles
Baidu academic
Similar articles in Baidu academic
[Fang, Zhiwei]'s Articles
[Liu, Jing]'s Articles
[Li, Yong]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Fang, Zhiwei]'s Articles
[Liu, Jing]'s Articles
[Li, Yong]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: Improving visual question answering using dropout and enhanced question encoder.pdf
Format: Adobe PDF
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.