BEVBert: Multimodal Map Pre-training for Language-guided Navigation | |
Dong An; Yuankai Qi; Yangguang Li; Yan Huang; Liang Wang; Tieniu Tan; Jing Shao | |
2023-10 | |
会议名称 | IEEE International Conference on Computer Vision |
会议录名称 | Proceedings of the IEEE International Conference on Computer Vision |
会议日期 | 2023-10-2 |
会议地点 | Paris, France |
摘要 | Large-scale pre-training has shown promising results on the vision-and-language navigation (VLN) task. However, most existing pre-training methods employ discrete panoramas to learn visual-textual associations. This requires the model to implicitly correlate incomplete, duplicate observations within the panoramas, which may impair an agent’s spatial understanding. Thus, we propose a new map-based pre-training paradigm that is spatial-aware for use in VLN. Concretely, we build a local metric map to explicitly aggregate incomplete observations and remove duplicates, while modeling navigation dependency in a global topological map. This hybrid design can balance the demand of VLN for both short-term reasoning and long-term planning. Then, based on the hybrid map, we devise a pre-training framework to learn a multimodal map representation, which enhances spatial-aware cross-modal reasoning thereby facilitating the language-guided navigation goal. Extensive experiments demonstrate the effectiveness of the map-based pre-training route for VLN, and the proposed method achieves state-of-the-art on four VLN benchmarks. |
收录类别 | EI |
语种 | 英语 |
是否为代表性论文 | 是 |
七大方向——子方向分类 | 机器人感知与决策 |
国重实验室规划方向分类 | 多模态协同认知 |
是否有论文关联数据集需要存交 | 否 |
文献类型 | 会议论文 |
条目标识符 | http://ir.ia.ac.cn/handle/173211/56611 |
专题 | 模式识别实验室 |
作者单位 | 1.Institute of Automation, Chinese Academy of Sciences 2.School of Future Technology, UCAS 3.Australian Institute for Machine Learning, University of Adelaide 4.SenseTime Research 5.Nanjing University 6.Shanghai AI Laboratory |
推荐引用方式 GB/T 7714 | Dong An,Yuankai Qi,Yangguang Li,et al. BEVBert: Multimodal Map Pre-training for Language-guided Navigation[C],2023. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | ||
bevbert.pdf(1722KB) | 会议论文 | 开放获取 | CC BY-NC-SA | 浏览 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论