CASIA OpenIR  > 模式识别实验室
BEVBert: Multimodal Map Pre-training for Language-guided Navigation
Dong An; Yuankai Qi; Yangguang Li; Yan Huang; Liang Wang; Tieniu Tan; Jing Shao
2023-10
Conference NameIEEE International Conference on Computer Vision
Source PublicationProceedings of the IEEE International Conference on Computer Vision
Conference Date2023-10-2
Conference PlaceParis, France
Abstract

Large-scale pre-training has shown promising results on the vision-and-language navigation (VLN) task. However, most existing pre-training methods employ discrete panoramas to learn visual-textual associations. This requires the model to implicitly correlate incomplete, duplicate observations within the panoramas, which may impair an agent’s spatial understanding. Thus, we propose a new map-based pre-training paradigm that is spatial-aware for use in VLN. Concretely, we build a local metric map to explicitly aggregate incomplete observations and remove duplicates, while modeling navigation dependency in a global topological map. This hybrid design can balance the demand of VLN for both short-term reasoning and long-term planning. Then, based on the hybrid map, we devise a pre-training framework to learn a multimodal map representation, which enhances spatial-aware cross-modal reasoning thereby facilitating the language-guided navigation goal. Extensive experiments demonstrate the effectiveness of the map-based pre-training route for VLN, and the proposed method achieves state-of-the-art on four VLN benchmarks.

Indexed ByEI
Language英语
IS Representative Paper
Sub direction classification机器人感知与决策
planning direction of the national heavy laboratory多模态协同认知
Paper associated data
Document Type会议论文
Identifierhttp://ir.ia.ac.cn/handle/173211/56611
Collection模式识别实验室
Affiliation1.Institute of Automation, Chinese Academy of Sciences
2.School of Future Technology, UCAS
3.Australian Institute for Machine Learning, University of Adelaide
4.SenseTime Research
5.Nanjing University
6.Shanghai AI Laboratory
Recommended Citation
GB/T 7714
Dong An,Yuankai Qi,Yangguang Li,et al. BEVBert: Multimodal Map Pre-training for Language-guided Navigation[C],2023.
Files in This Item:
File Name/Size DocType Version Access License
bevbert.pdf(1722KB)会议论文 开放获取CC BY-NC-SAView
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Dong An]'s Articles
[Yuankai Qi]'s Articles
[Yangguang Li]'s Articles
Baidu academic
Similar articles in Baidu academic
[Dong An]'s Articles
[Yuankai Qi]'s Articles
[Yangguang Li]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Dong An]'s Articles
[Yuankai Qi]'s Articles
[Yangguang Li]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: bevbert.pdf
Format: Adobe PDF
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.