CASIA OpenIR  > 模式识别国家重点实验室  > 图像与视频分析
Boosted Transformer for Image Captioning
Li, Jiangyun1,2; Yao, Peng1,2,4; Guo, Longteng3; Zhang, Weicun1,2
Corresponding AuthorZhang, Weicun(
AbstractImage captioning attempts to generate a description given an image, usually taking Convolutional Neural Network as the encoder to extract the visual features and a sequence model, among which the self-attention mechanism has achieved advanced progress recently, as the decoder to generate descriptions. However, this predominant encoder-decoder architecture has some problems to be solved. On the encoder side, without the semantic concepts, the extracted visual features do not make full use of the image information. On the decoder side, the sequence self-attention only relies on word representations, lacking the guidance of visual information and easily influenced by the language prior. In this paper, we propose a novel boosted transformer model with two attention modules for the above-mentioned problems, i.e., Concept-Guided Attention (CGA) and Vision-Guided Attention (VGA). Our model utilizes CGA in the encoder, to obtain the boosted visual features by integrating the instance-level concepts into the visual features. In the decoder, we stack VGA, which uses the visual information as a bridge to model internal relationships among the sequences and can be an auxiliary module of sequence self-attention. Quantitative and qualitative results on the Microsoft COCO dataset demonstrate the better performance of our model than the state-of-the-art approaches.
Keywordimage captioning self-attention deep learning transformer
Indexed BySCI
Funding ProjectNational Nature Science Foundation of China[61671054] ; Beijing Natural Science Foundation[4182038] ; National Nature Science Foundation of China[61671054] ; Beijing Natural Science Foundation[4182038]
Funding OrganizationNational Nature Science Foundation of China ; Beijing Natural Science Foundation
WOS Research AreaChemistry ; Materials Science ; Physics
WOS SubjectChemistry, Multidisciplinary ; Materials Science, Multidisciplinary ; Physics, Applied
WOS IDWOS:000484444100054
Citation statistics
Cited Times:1[WOS]   [WOS Record]     [Related Records in WOS]
Document Type期刊论文
Corresponding AuthorZhang, Weicun
Affiliation1.Univ Sci & Technol Beijing, Sch Automat & Elect Engn, Beijing 100083, Peoples R China
2.Minist Educ, Key Lab Knowledge Automat Ind Proc, Beijing 100083, Peoples R China
3.Chinese Acad Sci, Natl Lab Pattern Recognit, Inst Automat, Beijing 100190, Peoples R China
4.Univ Sci & Technol Beijing, Beijing 100083, Peoples R China
Recommended Citation
GB/T 7714
Li, Jiangyun,Yao, Peng,Guo, Longteng,et al. Boosted Transformer for Image Captioning[J]. APPLIED SCIENCES-BASEL,2019,9(16):15.
APA Li, Jiangyun,Yao, Peng,Guo, Longteng,&Zhang, Weicun.(2019).Boosted Transformer for Image Captioning.APPLIED SCIENCES-BASEL,9(16),15.
MLA Li, Jiangyun,et al."Boosted Transformer for Image Captioning".APPLIED SCIENCES-BASEL 9.16(2019):15.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Li, Jiangyun]'s Articles
[Yao, Peng]'s Articles
[Guo, Longteng]'s Articles
Baidu academic
Similar articles in Baidu academic
[Li, Jiangyun]'s Articles
[Yao, Peng]'s Articles
[Guo, Longteng]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Li, Jiangyun]'s Articles
[Yao, Peng]'s Articles
[Guo, Longteng]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.