Boosted Transformer for Image Captioning
Li, Jiangyun1,2; Yao, Peng1,2,4; Guo, Longteng3; Zhang, Weicun1,2
发表期刊APPLIED SCIENCES-BASEL
2019-08-01
卷号9期号:16页码:15
摘要

Image captioning attempts to generate a description given an image, usually taking Convolutional Neural Network as the encoder to extract the visual features and a sequence model, among which the self-attention mechanism has achieved advanced progress recently, as the decoder to generate descriptions. However, this predominant encoder-decoder architecture has some problems to be solved. On the encoder side, without the semantic concepts, the extracted visual features do not make full use of the image information. On the decoder side, the sequence self-attention only relies on word representations, lacking the guidance of visual information and easily influenced by the language prior. In this paper, we propose a novel boosted transformer model with two attention modules for the above-mentioned problems, i.e., Concept-Guided Attention (CGA) and Vision-Guided Attention (VGA). Our model utilizes CGA in the encoder, to obtain the boosted visual features by integrating the instance-level concepts into the visual features. In the decoder, we stack VGA, which uses the visual information as a bridge to model internal relationships among the sequences and can be an auxiliary module of sequence self-attention. Quantitative and qualitative results on the Microsoft COCO dataset demonstrate the better performance of our model than the state-of-the-art approaches.

关键词image captioning self-attention deep learning transformer
DOI10.3390/app9163260
收录类别SCI
语种英语
资助项目Beijing Natural Science Foundation[4182038] ; National Nature Science Foundation of China[61671054] ; National Nature Science Foundation of China[61671054] ; Beijing Natural Science Foundation[4182038]
WOS研究方向Chemistry ; Materials Science ; Physics
WOS类目Chemistry, Multidisciplinary ; Materials Science, Multidisciplinary ; Physics, Applied
WOS记录号WOS:000484444100054
出版者MDPI
引用统计
被引频次:10[WOS]   [WOS记录]     [WOS相关记录]
文献类型期刊论文
条目标识符http://ir.ia.ac.cn/handle/173211/27242
专题紫东太初大模型研究中心_图像与视频分析
通讯作者Zhang, Weicun
作者单位1.Univ Sci & Technol Beijing, Sch Automat & Elect Engn, Beijing 100083, Peoples R China
2.Minist Educ, Key Lab Knowledge Automat Ind Proc, Beijing 100083, Peoples R China
3.Chinese Acad Sci, Natl Lab Pattern Recognit, Inst Automat, Beijing 100190, Peoples R China
4.Univ Sci & Technol Beijing, Beijing 100083, Peoples R China
推荐引用方式
GB/T 7714
Li, Jiangyun,Yao, Peng,Guo, Longteng,et al. Boosted Transformer for Image Captioning[J]. APPLIED SCIENCES-BASEL,2019,9(16):15.
APA Li, Jiangyun,Yao, Peng,Guo, Longteng,&Zhang, Weicun.(2019).Boosted Transformer for Image Captioning.APPLIED SCIENCES-BASEL,9(16),15.
MLA Li, Jiangyun,et al."Boosted Transformer for Image Captioning".APPLIED SCIENCES-BASEL 9.16(2019):15.
条目包含的文件 下载所有文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
Boosted Transformer (2184KB)期刊论文作者接受稿开放获取CC BY-NC-SA浏览 下载
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Li, Jiangyun]的文章
[Yao, Peng]的文章
[Guo, Longteng]的文章
百度学术
百度学术中相似的文章
[Li, Jiangyun]的文章
[Yao, Peng]的文章
[Guo, Longteng]的文章
必应学术
必应学术中相似的文章
[Li, Jiangyun]的文章
[Yao, Peng]的文章
[Guo, Longteng]的文章
相关权益政策
暂无数据
收藏/分享
文件名: Boosted Transformer for Image Captioning.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。