Scene captioning with deep fusion of images and point clouds
Yu, Qiang1,3; Zhang, Chunxia4; Weng, Lubin1; Xiang, Shiming2,3; Pan, Chunhong2
发表期刊PATTERN RECOGNITION LETTERS
ISSN0167-8655
2022-06-01
卷号158页码:9-15
通讯作者Yu, Qiang(qiang.yu@ia.ac.cn)
摘要Recently, the fusion of images and point clouds has received appreciable attentions in various fields, for example, autonomous driving, whose advantage over single-modal vision has been verified. However, it has not been extensively exploited in the scene captioning task. In this paper, a novel scene captioning framework with deep fusion of images and point clouds based on region correlation and attention is proposed to improve performances of captioning models. In our model, a symmetrical processing pipeline is designed for point clouds and images. First, 3D and 2D region features are generated respectively through region proposal generation, proposal fusion, and region pooling modules. Then, a feature fusion module is designed to integrate features according to the region correlation rule and the attention mechanism, which increases the interpretability of the fusion process and results in a sequence of fused visual features. Finally, the fused features are transformed into captions by an attention-based caption generation module. Comprehensive experiments indicate that the performance of our model reaches the state of the art.(c) 2022 Elsevier B.V. All rights reserved.
关键词Scene captioning Point cloud Deep fusion Scene captioning Point cloud Deep fusion
DOI10.1016/j.patrec.2022.04.017
收录类别SCI
语种英语
资助项目National Key Research and Development Program of China[2020AAA0104903] ; National Natural Science Foundation of China[62072039] ; National Natural Science Foundation of China[62076242] ; National Natural Science Foundation of China[61976208]
项目资助者National Key Research and Development Program of China ; National Natural Science Foundation of China
WOS研究方向Computer Science
WOS类目Computer Science, Artificial Intelligence
WOS记录号WOS:000797731300002
出版者ELSEVIER
引用统计
文献类型期刊论文
条目标识符http://ir.ia.ac.cn/handle/173211/49501
专题多模态人工智能系统全国重点实验室_先进时空数据分析与学习
通讯作者Yu, Qiang
作者单位1.Chinese Acad Sci, Inst Automat, Res Ctr Aerosp Informat, Beijing 100190, Peoples R China
2.Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
3.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
4.Beijing Inst Technol, Sch Comp Sci & Technol, Beijing 100081, Peoples R China
第一作者单位中国科学院自动化研究所
通讯作者单位中国科学院自动化研究所
推荐引用方式
GB/T 7714
Yu, Qiang,Zhang, Chunxia,Weng, Lubin,et al. Scene captioning with deep fusion of images and point clouds[J]. PATTERN RECOGNITION LETTERS,2022,158:9-15.
APA Yu, Qiang,Zhang, Chunxia,Weng, Lubin,Xiang, Shiming,&Pan, Chunhong.(2022).Scene captioning with deep fusion of images and point clouds.PATTERN RECOGNITION LETTERS,158,9-15.
MLA Yu, Qiang,et al."Scene captioning with deep fusion of images and point clouds".PATTERN RECOGNITION LETTERS 158(2022):9-15.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Yu, Qiang]的文章
[Zhang, Chunxia]的文章
[Weng, Lubin]的文章
百度学术
百度学术中相似的文章
[Yu, Qiang]的文章
[Zhang, Chunxia]的文章
[Weng, Lubin]的文章
必应学术
必应学术中相似的文章
[Yu, Qiang]的文章
[Zhang, Chunxia]的文章
[Weng, Lubin]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。