CASIA OpenIR  > 学术期刊  > Machine Intelligence Research
Comprehensive Relation Modelling for Image Paragraph Generation
Xianglu Zhu1,2; Zhang Zhang2,3; Wei Wang2; Zilei Wang1
发表期刊Machine Intelligence Research
ISSN2731-538X
2024
卷号21期号:2页码:369-382
摘要Image paragraph generation aims to generate a long description composed of multiple sentences, which is different from traditional image captioning containing only one sentence. Most of previous methods are dedicated to extracting rich features from image regions, and ignore modelling the visual relationships. In this paper, we propose a novel method to generate a paragraph by modelling visual relationships comprehensively. First, we parse an image into a scene graph, where each node represents a specific object and each edge denotes the relationship between two objects. Second, we enrich the object features by implicitly encoding visual relationships through a graph convolutional network (GCN). We further explore high-order relations between different relation features using another graph convolutional network. In addition, we obtain the linguistic features by projecting the predicted object labels and their relationships into a semantic embedding space. With these features, we present an attention-based topic generation network to select relevant features and produce a set of topic vectors, which are then utilized to generate multiple sentences. We evaluate the proposed method on the Stanford image-paragraph dataset which is currently the only available dataset for image paragraph generation, and our method achieves competitive performance in comparison with other state-of-the-art (SOTA) methods.
关键词Image paragraph generation, visual relationship, scene graph, graph convolutional network (GCN), long short-term memory
DOI10.1007/s11633-022-1408-2
引用统计
文献类型期刊论文
条目标识符http://ir.ia.ac.cn/handle/173211/56044
专题学术期刊_Machine Intelligence Research
作者单位1.Automation Department, University of Science and Technology of China, Hefei 230027, China
2.Center for Research on Intelligent Perception and Computing, National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
3.University of Chinese Academy of Sciences, Beijing 100864, China
第一作者单位模式识别国家重点实验室
推荐引用方式
GB/T 7714
Xianglu Zhu,Zhang Zhang,Wei Wang,et al. Comprehensive Relation Modelling for Image Paragraph Generation[J]. Machine Intelligence Research,2024,21(2):369-382.
APA Xianglu Zhu,Zhang Zhang,Wei Wang,&Zilei Wang.(2024).Comprehensive Relation Modelling for Image Paragraph Generation.Machine Intelligence Research,21(2),369-382.
MLA Xianglu Zhu,et al."Comprehensive Relation Modelling for Image Paragraph Generation".Machine Intelligence Research 21.2(2024):369-382.
条目包含的文件 下载所有文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
MIR-2022-08-253.pdf(1963KB)期刊论文出版稿开放获取CC BY-NC-SA浏览 下载
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Xianglu Zhu]的文章
[Zhang Zhang]的文章
[Wei Wang]的文章
百度学术
百度学术中相似的文章
[Xianglu Zhu]的文章
[Zhang Zhang]的文章
[Wei Wang]的文章
必应学术
必应学术中相似的文章
[Xianglu Zhu]的文章
[Zhang Zhang]的文章
[Wei Wang]的文章
相关权益政策
暂无数据
收藏/分享
文件名: MIR-2022-08-253.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。