CASIA OpenIR
Structure Preserving Convolutional Attention for Image Captioning
Lu, Shichen1,2,5; Hu, Ruimin1,2; Liu, Jing3; Guo, Longteng3; Zheng, Fei4
Source PublicationAPPLIED SCIENCES-BASEL
2019-07-02
Volume9Issue:14Pages:10
Corresponding AuthorHu, Ruimin(hrm@whu.edu.cn)
AbstractIn the task of image captioning, learning the attentive image regions is necessary to adaptively and precisely focus on the object semantics relevant to each decoded word. In this paper, we propose a convolutional attention module that can preserve the spatial structure of the image by performing the convolution operation directly on the 2D feature maps. The proposed attention mechanism contains two components: convolutional spatial attention and cross-channel attention, aiming to determine the intended regions to describe the image along the spatial and channel dimensions, respectively. Both of the two attentions are calculated at each decoding step. In order to preserve the spatial structure, instead of operating on the vector representation of each image grid, the two attention components are both computed directly on the entire feature maps with convolution operations. Experiments on two large-scale datasets (MSCOCO and Flickr30K) demonstrate the outstanding performance of our proposed method.
Keywordimage captioning attention spatial structure deep learning computer vision
DOI10.3390/app9142888
Indexed BySCI
Language英语
Funding ProjectNational Nature Science Foundation of China[U1736206]
Funding OrganizationNational Nature Science Foundation of China
WOS Research AreaChemistry ; Materials Science ; Physics
WOS SubjectChemistry, Multidisciplinary ; Materials Science, Multidisciplinary ; Physics, Applied
WOS IDWOS:000479026900115
PublisherMDPI
Citation statistics
Document Type期刊论文
Identifierhttp://ir.ia.ac.cn/handle/173211/27614
Collection中国科学院自动化研究所
Corresponding AuthorHu, Ruimin
Affiliation1.Wuhan Univ, Natl Engn Res Ctr Multimedia Software, Sch Comp, Wuhan 430072, Hubei, Peoples R China
2.Wuhan Univ, Hubei Key Lab Multimedia & Network Commun Engn, Wuhan 430072, Hubei, Peoples R China
3.Chinese Acad Sci, Natl Lab Pattern Recognit, Inst Automat, Beijing 100190, Peoples R China
4.China Gen Technol Res Inst, Beijing 100190, Peoples R China
5.Wuhan Univ, Informat Dept, Dormitory 8,Room 617, Wuhan 430072, Hubei, Peoples R China
Recommended Citation
GB/T 7714
Lu, Shichen,Hu, Ruimin,Liu, Jing,et al. Structure Preserving Convolutional Attention for Image Captioning[J]. APPLIED SCIENCES-BASEL,2019,9(14):10.
APA Lu, Shichen,Hu, Ruimin,Liu, Jing,Guo, Longteng,&Zheng, Fei.(2019).Structure Preserving Convolutional Attention for Image Captioning.APPLIED SCIENCES-BASEL,9(14),10.
MLA Lu, Shichen,et al."Structure Preserving Convolutional Attention for Image Captioning".APPLIED SCIENCES-BASEL 9.14(2019):10.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Lu, Shichen]'s Articles
[Hu, Ruimin]'s Articles
[Liu, Jing]'s Articles
Baidu academic
Similar articles in Baidu academic
[Lu, Shichen]'s Articles
[Hu, Ruimin]'s Articles
[Liu, Jing]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Lu, Shichen]'s Articles
[Hu, Ruimin]'s Articles
[Liu, Jing]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.