Scene captioning with deep fusion of images and point clouds

doi:10.1016/j.patrec.2022.04.017

CASIA OpenIR > 多模态人工智能系统全国重点实验室 > 先进时空数据分析与学习

	Scene captioning with deep fusion of images and point clouds
	Yu, Qiang1,3 ; Zhang, Chunxia 4; Weng, Lubin1 ; Xiang, Shiming2,3 ; Pan, Chunhong2
发表期刊	PATTERN RECOGNITION LETTERS
ISSN	0167-8655
	2022-06-01
卷号	158 页码:9-15
通讯作者	Yu, Qiang(qiang.yu@ia.ac.cn)
摘要	Recently, the fusion of images and point clouds has received appreciable attentions in various fields, for example, autonomous driving, whose advantage over single-modal vision has been verified. However, it has not been extensively exploited in the scene captioning task. In this paper, a novel scene captioning framework with deep fusion of images and point clouds based on region correlation and attention is proposed to improve performances of captioning models. In our model, a symmetrical processing pipeline is designed for point clouds and images. First, 3D and 2D region features are generated respectively through region proposal generation, proposal fusion, and region pooling modules. Then, a feature fusion module is designed to integrate features according to the region correlation rule and the attention mechanism, which increases the interpretability of the fusion process and results in a sequence of fused visual features. Finally, the fused features are transformed into captions by an attention-based caption generation module. Comprehensive experiments indicate that the performance of our model reaches the state of the art.(c) 2022 Elsevier B.V. All rights reserved.
关键词	Scene captioning Point cloud Deep fusion Scene captioning Point cloud Deep fusion
DOI	10.1016/j.patrec.2022.04.017
收录类别	SCI
语种	英语
资助项目	National Key Research and Development Program of China[2020AAA0104903] ; National Natural Science Foundation of China[62072039] ; National Natural Science Foundation of China[62076242] ; National Natural Science Foundation of China[61976208]
项目资助者	National Key Research and Development Program of China ; National Natural Science Foundation of China
WOS研究方向	Computer Science
WOS类目	Computer Science, Artificial Intelligence
WOS记录号	WOS:000797731300002
出版者	ELSEVIER
引用统计
文献类型	期刊论文
条目标识符	http://ir.ia.ac.cn/handle/173211/49501
专题	多模态人工智能系统全国重点实验室_先进时空数据分析与学习
通讯作者	Yu, Qiang
作者单位	1.Chinese Acad Sci, Inst Automat, Res Ctr Aerosp Informat, Beijing 100190, Peoples R China 2.Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China 3.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China 4.Beijing Inst Technol, Sch Comp Sci & Technol, Beijing 100081, Peoples R China
第一作者单位	中国科学院自动化研究所
通讯作者单位	中国科学院自动化研究所
推荐引用方式 GB/T 7714	Yu, Qiang,Zhang, Chunxia,Weng, Lubin,et al. Scene captioning with deep fusion of images and point clouds[J]. PATTERN RECOGNITION LETTERS,2022,158:9-15.
APA	Yu, Qiang,Zhang, Chunxia,Weng, Lubin,Xiang, Shiming,&Pan, Chunhong.(2022).Scene captioning with deep fusion of images and point clouds.PATTERN RECOGNITION LETTERS,158,9-15.
MLA	Yu, Qiang,et al."Scene captioning with deep fusion of images and point clouds".PATTERN RECOGNITION LETTERS 158(2022):9-15.