CASIA OpenIR  > 精密感知与控制研究中心
Image captioning via hierarchical attention mechanism and policy gradient optimization
Yan, Shiyang1; Xie, Yuan2,3,5,7; Wu, Fangyu4,6; Smith, Jeremy S.4; Lu, Wenjin6; Zhang, Bailing2,3
发表期刊SIGNAL PROCESSING
ISSN0165-1684
2020-02-01
卷号167页码:12
通讯作者Yan, Shiyang(shiyang.yan@qub.ac.uk)
摘要Automatically generating the descriptions of an image, i.e., image captioning, is an important and fundamental topic in artificial intelligence, which bridges the gap between computer vision and natural language processing. Based on the successful deep learning models, especially the CNN model and Long Short Term Memories (LSTMs) with attention mechanism, we propose a hierarchical attention model by utilizing both of the global CNN features and the local object features for more effective feature representation and reasoning in image captioning. The generative adversarial network (GAN), together with a reinforcement learning (RL) algorithm, is applied to solve the exposure bias problem in RNN-based supervised training for language problems. In addition, through the automatic measurement of the consistency between the generated caption and the image content by the discriminator in the GAN framework and RL optimization, we make the finally generated sentences more accurate and natural. Comprehensive experiments show the improved performance of the hierarchical attention mechanism and the effectiveness of our RL-based optimization method. Our model achieves state-of-the-art results on several important metrics in the MSCOCO dataset, using only greedy inference. (C) 2019 Elsevier B.V. All rights reserved.
关键词Image captioning Hierarchical attention mechanism Generative adversarial network Reinforcement learning Policy gradient
DOI10.1016/j.sigpro.2019.107329
关键词[WOS]NETWORKS
收录类别SCI
语种英语
WOS研究方向Engineering
WOS类目Engineering, Electrical & Electronic
WOS记录号WOS:000497600200030
出版者ELSEVIER
引用统计
被引频次:24[WOS]   [WOS记录]     [WOS相关记录]
文献类型期刊论文
条目标识符http://ir.ia.ac.cn/handle/173211/29388
专题精密感知与控制研究中心
通讯作者Yan, Shiyang
作者单位1.Queens Univ Belfast, Sch Elect Elect Engn & Comp Sci, Belfast, Antrim, North Ireland
2.Inst Adv Artificial Intelligence Nanjing, Nanjing, Jiangsu, Peoples R China
3.Horizon Robot, Beijing, Peoples R China
4.Univ Liverpool, Elect Engn & Elect, Liverpool, Merseyside, England
5.Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
6.Xian Jiaotong Liverpool Univ, Dept Comp Sci & Software Engn, Suzhou, Peoples R China
7.East China Normal Univ, Sch Comp Sci & Software Engn, Shanghai, Peoples R China
推荐引用方式
GB/T 7714
Yan, Shiyang,Xie, Yuan,Wu, Fangyu,et al. Image captioning via hierarchical attention mechanism and policy gradient optimization[J]. SIGNAL PROCESSING,2020,167:12.
APA Yan, Shiyang,Xie, Yuan,Wu, Fangyu,Smith, Jeremy S.,Lu, Wenjin,&Zhang, Bailing.(2020).Image captioning via hierarchical attention mechanism and policy gradient optimization.SIGNAL PROCESSING,167,12.
MLA Yan, Shiyang,et al."Image captioning via hierarchical attention mechanism and policy gradient optimization".SIGNAL PROCESSING 167(2020):12.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Yan, Shiyang]的文章
[Xie, Yuan]的文章
[Wu, Fangyu]的文章
百度学术
百度学术中相似的文章
[Yan, Shiyang]的文章
[Xie, Yuan]的文章
[Wu, Fangyu]的文章
必应学术
必应学术中相似的文章
[Yan, Shiyang]的文章
[Xie, Yuan]的文章
[Wu, Fangyu]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。