CASIA OpenIR  > 模式识别实验室
Learning visual relationship and context-aware attention for image captioning
Wang, Junbo; Wang, Wei; Wang, Liang; Wang, Zhiyong; Feng, Dagan; Tan Tieniu
Source PublicationPattern Recognition

Image captioning which automatically generates natural language descriptions for images has attracted lots of research attentions and there have been substantial progresses with attention based captioning methods. However, most attention-based image captioning methods focus on extracting visual information in regions of interest for sentence generation and usually ignore the relational reasoning among those regions of interest in an image. Moreover, these methods do not take into account previously attended regions which can be used to guide the subsequent attention selection. In this paper, we propose a novel method to implicitly model the relationship among regions of interest in an image with a graph neural network, as well as a novel context-aware attention mechanism to guide attention selection by fully memorizing previously attended visual content. Compared with the existing attention-based image captioning methods, ours can not only learn relation-aware visual representations for image captioning, but also consider historical context information on previous attention. We perform extensive experiments on two public benchmark datasets: MS COCO and Flickr30K, and the experimental results indicate that our proposed method is able to outperform various state-of-the-art methods in terms of the widely used evaluation metrics.

KeywordImage captioning Relational reasoning Context-aware attention
WOS IDWOS:000497600300019
Sub direction classification图像视频处理与分析
Citation statistics
Cited Times:92[WOS]   [WOS Record]     [Related Records in WOS]
Document Type期刊论文
Affiliation1.Center for Research on Intelligent Perception and Computing, National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences
2.Center for Excellence in Brain Science and Intelligence Technology
3.University of Chinese Academy of Sciences
4.School of Information Technologies, The University of Sydney
Recommended Citation
GB/T 7714
Wang, Junbo,Wang, Wei,Wang, Liang,et al. Learning visual relationship and context-aware attention for image captioning[J]. Pattern Recognition,2020(98):107075.
APA Wang, Junbo,Wang, Wei,Wang, Liang,Wang, Zhiyong,Feng, Dagan,&Tan Tieniu.(2020).Learning visual relationship and context-aware attention for image captioning.Pattern Recognition(98),107075.
MLA Wang, Junbo,et al."Learning visual relationship and context-aware attention for image captioning".Pattern Recognition .98(2020):107075.
Files in This Item: Download All
File Name/Size DocType Version Access License
PR.pdf(2059KB)其他 开放获取CC BY-NC-SAView Download
Related Services
Recommend this item
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Wang, Junbo]'s Articles
[Wang, Wei]'s Articles
[Wang, Liang]'s Articles
Baidu academic
Similar articles in Baidu academic
[Wang, Junbo]'s Articles
[Wang, Wei]'s Articles
[Wang, Liang]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Wang, Junbo]'s Articles
[Wang, Wei]'s Articles
[Wang, Liang]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: PR.pdf
Format: Adobe PDF
This file does not support browsing at this time
All comments (0)
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.