Knowledge Commons of Institute of Automation,CAS
Structure Preserving Convolutional Attention for Image Captioning | |
Lu, Shichen1,2,5; Hu, Ruimin1,2; Liu, Jing3; Guo, Longteng3; Zheng, Fei4 | |
发表期刊 | APPLIED SCIENCES-BASEL |
2019-07-02 | |
卷号 | 9期号:14页码:10 |
摘要 | In the task of image captioning, learning the attentive image regions is necessary to adaptively and precisely focus on the object semantics relevant to each decoded word. In this paper, we propose a convolutional attention module that can preserve the spatial structure of the image by performing the convolution operation directly on the 2D feature maps. The proposed attention mechanism contains two components: convolutional spatial attention and cross-channel attention, aiming to determine the intended regions to describe the image along the spatial and channel dimensions, respectively. Both of the two attentions are calculated at each decoding step. In order to preserve the spatial structure, instead of operating on the vector representation of each image grid, the two attention components are both computed directly on the entire feature maps with convolution operations. Experiments on two large-scale datasets (MSCOCO and Flickr30K) demonstrate the outstanding performance of our proposed method. |
关键词 | image captioning attention spatial structure deep learning computer vision |
DOI | 10.3390/app9142888 |
收录类别 | SCI |
语种 | 英语 |
资助项目 | National Nature Science Foundation of China[U1736206] ; National Nature Science Foundation of China[U1736206] |
WOS研究方向 | Chemistry ; Materials Science ; Physics |
WOS类目 | Chemistry, Multidisciplinary ; Materials Science, Multidisciplinary ; Physics, Applied |
WOS记录号 | WOS:000479026900115 |
出版者 | MDPI |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://ir.ia.ac.cn/handle/173211/27614 |
专题 | 紫东太初大模型研究中心_图像与视频分析 |
通讯作者 | Hu, Ruimin |
作者单位 | 1.Wuhan Univ, Natl Engn Res Ctr Multimedia Software, Sch Comp, Wuhan 430072, Hubei, Peoples R China 2.Wuhan Univ, Hubei Key Lab Multimedia & Network Commun Engn, Wuhan 430072, Hubei, Peoples R China 3.Chinese Acad Sci, Natl Lab Pattern Recognit, Inst Automat, Beijing 100190, Peoples R China 4.China Gen Technol Res Inst, Beijing 100190, Peoples R China 5.Wuhan Univ, Informat Dept, Dormitory 8,Room 617, Wuhan 430072, Hubei, Peoples R China |
推荐引用方式 GB/T 7714 | Lu, Shichen,Hu, Ruimin,Liu, Jing,et al. Structure Preserving Convolutional Attention for Image Captioning[J]. APPLIED SCIENCES-BASEL,2019,9(14):10. |
APA | Lu, Shichen,Hu, Ruimin,Liu, Jing,Guo, Longteng,&Zheng, Fei.(2019).Structure Preserving Convolutional Attention for Image Captioning.APPLIED SCIENCES-BASEL,9(14),10. |
MLA | Lu, Shichen,et al."Structure Preserving Convolutional Attention for Image Captioning".APPLIED SCIENCES-BASEL 9.14(2019):10. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | ||
[Applied Science] St(2351KB) | 期刊论文 | 作者接受稿 | 开放获取 | CC BY-NC-SA | 浏览 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论