Knowledge Commons of Institute of Automation,CAS
Scene captioning with deep fusion of images and point clouds | |
Yu, Qiang1,3; Zhang, Chunxia4; Weng, Lubin1; Xiang, Shiming2,3; Pan, Chunhong2 | |
发表期刊 | PATTERN RECOGNITION LETTERS |
ISSN | 0167-8655 |
2022-06-01 | |
卷号 | 158页码:9-15 |
通讯作者 | Yu, Qiang(qiang.yu@ia.ac.cn) |
摘要 | Recently, the fusion of images and point clouds has received appreciable attentions in various fields, for example, autonomous driving, whose advantage over single-modal vision has been verified. However, it has not been extensively exploited in the scene captioning task. In this paper, a novel scene captioning framework with deep fusion of images and point clouds based on region correlation and attention is proposed to improve performances of captioning models. In our model, a symmetrical processing pipeline is designed for point clouds and images. First, 3D and 2D region features are generated respectively through region proposal generation, proposal fusion, and region pooling modules. Then, a feature fusion module is designed to integrate features according to the region correlation rule and the attention mechanism, which increases the interpretability of the fusion process and results in a sequence of fused visual features. Finally, the fused features are transformed into captions by an attention-based caption generation module. Comprehensive experiments indicate that the performance of our model reaches the state of the art.(c) 2022 Elsevier B.V. All rights reserved. |
关键词 | Scene captioning Point cloud Deep fusion Scene captioning Point cloud Deep fusion |
DOI | 10.1016/j.patrec.2022.04.017 |
收录类别 | SCI |
语种 | 英语 |
资助项目 | National Key Research and Development Program of China[2020AAA0104903] ; National Natural Science Foundation of China[62072039] ; National Natural Science Foundation of China[62076242] ; National Natural Science Foundation of China[61976208] |
项目资助者 | National Key Research and Development Program of China ; National Natural Science Foundation of China |
WOS研究方向 | Computer Science |
WOS类目 | Computer Science, Artificial Intelligence |
WOS记录号 | WOS:000797731300002 |
出版者 | ELSEVIER |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://ir.ia.ac.cn/handle/173211/49501 |
专题 | 多模态人工智能系统全国重点实验室_先进时空数据分析与学习 |
通讯作者 | Yu, Qiang |
作者单位 | 1.Chinese Acad Sci, Inst Automat, Res Ctr Aerosp Informat, Beijing 100190, Peoples R China 2.Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China 3.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China 4.Beijing Inst Technol, Sch Comp Sci & Technol, Beijing 100081, Peoples R China |
第一作者单位 | 中国科学院自动化研究所 |
通讯作者单位 | 中国科学院自动化研究所 |
推荐引用方式 GB/T 7714 | Yu, Qiang,Zhang, Chunxia,Weng, Lubin,et al. Scene captioning with deep fusion of images and point clouds[J]. PATTERN RECOGNITION LETTERS,2022,158:9-15. |
APA | Yu, Qiang,Zhang, Chunxia,Weng, Lubin,Xiang, Shiming,&Pan, Chunhong.(2022).Scene captioning with deep fusion of images and point clouds.PATTERN RECOGNITION LETTERS,158,9-15. |
MLA | Yu, Qiang,et al."Scene captioning with deep fusion of images and point clouds".PATTERN RECOGNITION LETTERS 158(2022):9-15. |
条目包含的文件 | 条目无相关文件。 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论