CASIA OpenIR  > 学术期刊  > IEEE/CAA Journal of Automatica Sinica
Depth-Guided Vision Transformer With Normalizing Flows for Monocular 3D Object Detection
Cong Pan; Junran Peng; Zhaoxiang Zhang
发表期刊IEEE/CAA Journal of Automatica Sinica
ISSN2329-9266
2024
卷号11期号:3页码:673-689
通讯作者Pan, Cong(pancong2018@ia.ac.cn) ; Zhang, Zhaoxiang(zhaoxiang.zhang@ia.ac.cn)
摘要Monocular 3D object detection is challenging due to the lack of accurate depth information. Some methods estimate the pixel-wise depth maps from off-the-shelf depth estimators and then use them as an additional input to augment the RGB images. Depth-based methods attempt to convert estimated depth maps to pseudo-LiDAR and then use LiDAR-based object detectors or focus on the perspective of image and depth fusion learning. However, they demonstrate limited performance and efficiency as a result of depth inaccuracy and complex fusion mode with convolutions. Different from these approaches, our proposed depth-guided vision transformer with a normalizing flows (NF-DVT) network uses normalizing flows to build priors in depth maps to achieve more accurate depth information. Then we develop a novel Swin-Transformer-based backbone with a fusion module to process RGB image patches and depth map patches with two separate branches and fuse them using cross-attention to exchange information with each other. Furthermore, with the help of pixel-wise relative depth values in depth maps, we develop new relative position embeddings in the cross-attention mechanism to capture more accurate sequence ordering of input tokens. Our method is the first Swin-Transformer-based backbone architecture for monocular 3D object detection. The experimental results on the KITTI and the challenging Waymo Open datasets show the effectiveness of our proposed method and superior performance over previous counterparts.
关键词Monocular 3D object detection normalizing flows Swin Transformer
DOI10.1109/JAS.2023.123660
收录类别SCI
语种英语
资助项目National Natural Science Foundation of China
项目资助者National Natural Science Foundation of China
WOS研究方向Automation & Control Systems
WOS类目Automation & Control Systems
WOS记录号WOS:001179789200022
出版者IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
引用统计
文献类型期刊论文
条目标识符http://ir.ia.ac.cn/handle/173211/54599
专题学术期刊_IEEE/CAA Journal of Automatica Sinica
推荐引用方式
GB/T 7714
Cong Pan,Junran Peng,Zhaoxiang Zhang. Depth-Guided Vision Transformer With Normalizing Flows for Monocular 3D Object Detection[J]. IEEE/CAA Journal of Automatica Sinica,2024,11(3):673-689.
APA Cong Pan,Junran Peng,&Zhaoxiang Zhang.(2024).Depth-Guided Vision Transformer With Normalizing Flows for Monocular 3D Object Detection.IEEE/CAA Journal of Automatica Sinica,11(3),673-689.
MLA Cong Pan,et al."Depth-Guided Vision Transformer With Normalizing Flows for Monocular 3D Object Detection".IEEE/CAA Journal of Automatica Sinica 11.3(2024):673-689.
条目包含的文件 下载所有文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
JAS-2023-0177.pdf(37784KB)期刊论文出版稿开放获取CC BY-NC-SA浏览 下载
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Cong Pan]的文章
[Junran Peng]的文章
[Zhaoxiang Zhang]的文章
百度学术
百度学术中相似的文章
[Cong Pan]的文章
[Junran Peng]的文章
[Zhaoxiang Zhang]的文章
必应学术
必应学术中相似的文章
[Cong Pan]的文章
[Junran Peng]的文章
[Zhaoxiang Zhang]的文章
相关权益政策
暂无数据
收藏/分享
文件名: JAS-2023-0177.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。