基于三维语义地图的室内外大场景视觉定位研究
时天欣
2020-05
页数90
学位类型硕士
中文摘要
    视觉定位是计算机视觉领域中一项十分关键的技术,其在三维重建、同步定位与地图构建、增强现实、无人驾驶等领域均有广泛应用。本文针对室内外大场景视觉定位问题进行了系统研究,特别是对提高视觉定位的精确性、鲁棒性以及实际应用性问题开展了相关工作,主要研究内容和贡献如下:
    (1).针对传统定位方法在面对环境变化时检索图像正确率较低的问题,提出了一种利用语义信息筛选检索图像的定位方法。主要贡献为:为普通稀疏三维模型中每一个三维点赋予语义标签,并根据语义类别去除不利于定位任务的三维点;将带有语义标签的三维点投影到语义分割后的查询图像上,统计语义一致的三维点数量,并作为检索图像的语义一致性得分;根据语义得分,为2D-3D匹配赋予权重,并将其用于RANSAC(RANdom SAmple Consensus)过程中被抽选的概率,以便能够有效筛选出正确检索图像产生的匹配,并用于最终的计算。在大时间跨度视觉定位评测数据集上的实验表明,本方法整体定位精度均要高于主流方法。       
    (2).在上述方法基础之上,针对传统手工设计特征面对室内外环境变化不稳定的缺点,提出了利用稠密语义点云以及混合特征的定位方法。主要贡献为:对当前主流的基于深度学习的特征进行了系统性评测,总结定位性能并给出使用建议;提出联合使用学习特征和手工设计特征使其能够在不同环境下发挥出各自的优势,进而能够同时提高在不同环境下的定位精度;提出使用稠密语义三维模型,该模型不仅能够适配所有类型的特征,而且能够有更多的三维点参与投影,从而提高了语义一致性得分的区分度。在大时间跨度视觉定位评测数据集上的实验表明,本方法定位性能均要优于当前主流定位方法。    
    (3).从视觉定位落地应用角度出发,综合当前云端计算能力以及定位方法的时间消耗,提出了适合实际应用的快速定位方法。主要贡献为:为室内以及室外定位任务分别选择了最为合适的图像检索方法;通过对算法复杂度和计算效率的平衡设计,在保证图像检索精度的前提下,提高了图像检索的计算效率;通过系统性实验,提出了利用位置聚类以及根据内点率筛选的定位策略。在三个实际视觉定位落地应用场景的测试表明,本方法能够满足当前实际定位应用对于定位精度以及计算效率的要求。
英文摘要
    Visual localization is a key technology in the field of computer vision, and it is widely used in 3D reconstruction, simultaneous localization and mapping (SLAM), augmented reality, autonomous vehicles, and other applications. This thesis focuses on large-scale indoor and outdoor scenes visual localization, in particular, on improving accuracy, robustness, and practical usability of visual localization methods. The main works and contributions are summarized below:   
    (1).In order to solve the problem that traditional visual localization methods had a low accuracy rate in retrieved images when facing large condition changes, this thesis proposes a visual localization method that uses semantic information to select the correct retrieved images. The main contributions are: Firstly, giving each 3D point in the sparse model a semantic label and removing 3D points which are useless for the task of visual localization. Secondly, projecting all visible 3D points into the segmented query image and counting the number of 3D points whose labels are the same as their projections in the query image. The number is used as semantic consistency score of retrieved image. Thirdly, according to the semantic consistency score, all 2D-3D matches are assigned weights which are used for the probability of being selected in the RANSAC (RANdom SAmple Consensus) process. As a result, this can effectively select matches produced by correct retrieved images and use them for the final calculation. Experiments on the challenging long-term visual localization benchmark datasets show that the overall localization accuracy of the proposed method is higher than that of the state-of-the-art methods.
    (2).On the basis of the above method, aiming at solving the shortcomings that traditional handcrafted features are unstable when facing outdoor and indoor condition changes, this thesis proposes a visual localization method that uses dense semantic 3D model and hybrid features. The main contributions are: Firstly, testing different state-of-the-art learned features systematically and giving conclusions and suggestions about their localization performances. Secondly, using handcrafted and learned features together such that this can make full use of their strengths and improve the localization accuracy in different viewing conditions. Thirdly, using dense semantic 3D model is proposed, which can not only adapt to all types of features, but also have more 3D points for projection such that can improve the discrimination of semantic consistency scores. Experiments on the challenging long-term visual localization benchmark datasets show that the proposed method outperforms all state-of-the-art approaches.  
    (3).From the perspective of practical application, a suitable method is proposed based on the current cloud computing capacity and the time consumption of visual localization methods. The main contributions are as follows: Firstly, the most suitable image retrieval methods are selected for indoor and outdoor visual localization tasks respectively. Secondly, through the balanced design of algorithm complexity and computational efficiency, we improve the computational efficiency of image retrieval method on the premise of ensuring the accuracy. Thirdly, through systematic experiments, we propose a visual localization strategy that based on place clustering and selection with inlier rate. The experiment results of three practical application scenarios show that the proposed method can meet the requirements of localization accuracy and computational efficiency in practical applications.
关键词视觉定位,图像检索,语义分割,学习特征,位姿估计
语种中文
七大方向——子方向分类三维视觉
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/39138
专题多模态人工智能系统全国重点实验室_机器人视觉
推荐引用方式
GB/T 7714
时天欣. 基于三维语义地图的室内外大场景视觉定位研究[D]. 中国科学院自动化研究所. 中国科学院大学,2020.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
时天欣论文.pdf(14335KB)学位论文 开放获取CC BY-NC-SA
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[时天欣]的文章
百度学术
百度学术中相似的文章
[时天欣]的文章
必应学术
必应学术中相似的文章
[时天欣]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。