基于时空不变性的机器人视觉定位方法研究

CASIA OpenIR > 毕业生 > 博士学位论文

	基于时空不变性的机器人视觉定位方法研究
	辛喆
	2020-05
页数	130
学位类型	博士
中文摘要	视觉定位技术一直是移动机器人领域的研究热点，近些年来取得了长足发展，已经可以实际应用于某些室内场景中。但当机器人长期自主地工作在室外场景中时，光照、季节、动态物体等环境变化，相似场景以及视角变化等会对定位产生干扰，造成定位失败。为了满足新一代智能机器人发展与应用的需求，研究基于时空不变性的视觉定位算法，提高应对环境变化和视角变化的鲁棒性，具有重大的理论意义和应用价值。本论文基于视觉传感器，重点关注环境变化复杂的室外场景，并针对视觉定位的关键技术，结合图像处理与深度学习开展了一系列研究工作，论文的主要工作和创新点总结如下： 1. 针对视觉定位中的位置识别（visual place recognition）问题，本文提出了一种基于时空信息融合的视觉位置识别方法，提高环境和视角变化下位置识别的成功率。空间约束方面，本方法首先利用多尺度超像素分割对当前帧进行路标提取，并通过卷积神经网络提取路标的抽象特征，获得图像的整体表达。之后，计算当前帧与地图中所有关键帧的相似度，找出最相似的关键帧。在相似度度量时，考虑路标所处的空间位置，减少误匹配。时间约束方面，在得到一段时间内基于单张图像的位置识别结果后，本方法根据共视关系对识别结果进行重构，将序列位置识别问题转换为概率投票问题，减少相似场景对位置识别的影响。 2. 针对场景中非显著性区域对于视觉位置识别、特征匹配的干扰，本文提出了一种基于弱监督的场景自主理解方法，本方法基于卷积神经网络，采用度量学习的方式，能够自主地从场景中选择出具有判别性的路标，并抽取其特征进行图像表达。网络的输出是一张响应图，响应值越高，代表图像中对应的区域越具有判别性。除响应图外，本网络还可得到各个路标对应的抽象特征。网络的训练过程是端到端的，采用弱监督的方式，只需要图像级别的标注作为监督信息。本方法分别针对视觉位置识别任务和视觉重定位任务进行了实验验证，实验结果表明，本方法可显著提升环境变化下视觉定位的成功率。此外，针对环境剧烈变化时特征匹配不稳定的问题，本方法进一步提出了基于路标的特征匹配方法，通过约束特征点的搜索范围，减少误匹配的产生，提高度量定位的稳定性。 3. 针对环境和视角变化造成特征点检测重复率低的问题，本文提出了一种基于自监督的显著一致性局部特征检测方法，本方法基于卷积神经网络，采用自监督的训练方式发掘图像中相对稳定的区域。在环境发生剧烈变化时，依然可以保持特征点检测的一致性。网络在设计时考虑了局部特征的两个重要属性：判别性和重复性，并通过设计新的损失函数，实现端到端的优化。本网络由两个子模块组成，分别为像素级关键点定位网络以及区域级互显著性排序网络。前者通过分析像素点周围的纹理信息，判断该点是否为关键点。后者通过分析语义信息，计算该点所在图像区域的全局显著性，避免因关键点出现在重复纹理的区域，影响特征描述子的提取与匹配。本方法在多个数据集上进行了测试，实验结果表明，相较于目前主流的局部特征提取算法，本方法可有效提升特征点检测的稳定性。 4. 针对基于时空不变性的视觉定位中的关键技术，本文将已有研究方法进行集成，设计并实现了中长期室外场景建图定位系统。地图构建时，本系统考虑如何利用其他传感器的信息对相机位姿进行优化，构建全局一致的地图。此外，考虑到机器人平台存储空间、计算能力等资源对地图尺寸的限制，在地图构建完成后，本系统会根据能观性和覆盖率对地图尺寸进行压缩，在保留最有效信息的同时，尽可能保证定位的成功率。定位功能主要将已有研究成果进行整合，考虑定位的实时性以及应对时空变化的鲁棒性。
英文摘要	Visual localization has always been one of the fundamental problems in the field of mobile robots. In recent years, visual localization has made great progress and can be practically applied to certain specific scenarios. However, as for long-term outdoor applications, the place differs in characteristics due to viewpoint, illumination, dynamic object and so on. All these variations increase the difficulty of visual localization. In order to meet the development and application requirements of the new generation of intelligent robots, it is of great theoretical and practical value to research visual localization for long-term outdoor scenarios. This dissertation is based on vision sensors and focuses on the key problems of spatio-temporal invariant visual localization, which also combines image processing and deep learning to conduct a series of innovative researches. The main contributions are summarized as follows: 1. As for the visual place recognition problem, we propose an approach that combines the temporal and spatial information to improve the recognition performance against environment and viewpoint changes. In terms of spatial constraints, we firstly employ multi-scale superpixel segmentation to generate landmarks. Then, we use a convolutional neural network to extract high-level representations for all landmarks. In order to get the recognition result, the current image needs to calculate the similarity with all keyframes in the pre-built map and finds the most similar one. In the similarity measurement, the spatial position of landmarks is considered to reduce mismatches. In terms of temporal constraints, we firstly reconstruct each query in the sequence using a categorical variable. The reconstruction process is based on the single image matching and the co-visibility relationship between keyframes. Then, we demonstrate that visual place recognition can be transformed into a probabilistic voting problem. We employ temporal information to overcome the perceptual aliasing caused by environmental changes. 2. Aiming at the interference of repeatable and non-salient areas in the scene to visual place recognition and feature matching, a weakly-supervised approach for scene understanding is proposed, which can effectively detect discriminative landmarks for place representations. The approach is based on the convolutional neural network trained by the metric learning strategy. The output of the network is a heatmap, the higher the response value is, the more discriminative the location is. In addition to the heatmap, the network can also obtain the high-level features corresponding to each landmark. The network is trained in an end-to-end manner with only image-level annotations as supervising information. We verified the effectiveness of the proposed approach in visual place recognition and visual relocalization tasks. Experimental results show that the proposed approach can significantly improve the success rate of visual localization under environmental variations. Moreover, we also propose a landmark-based feature matching strategy. By constraining the search range of feature points, the landmark-based feature matching strategy can reduce mismatches and increase the stability of the pose estimation process. 3. With severe environment and viewpoint changes, the local feature detector should be able to discover repeatable interest points for generating feature correspondences. Therefore, we propose an approach to detect consistent local features with co-saliency ranking optimization. The proposed approach is based on the convolutional neural network trained in a self-supervised strategy to discover the stable areas in the image. When the environment changes drastically, the network can still maintain the consistency of local feature detection. We consider two important attributes of local features: discrimination and repeatability and design a novel loss function to achieve end-to-end optimization. The proposed approach has been tested on multiple datasets, we show that the repeatability of the proposed network outperforms state-of-the-art hand-crafted and learning-based detectors by a significant margin. 4. Focusing on the key problems of spatio-temporal invariant visual localization, we integrate all the proposed approaches to design a long-term outdoor mapping and localization system. In the mapping process, we use a loosely coupled sensor fusion strategy to optimize the camera pose and build a globally consistent map. Moreover, considering the storage space and computing resources to the limitation of the map size, we employ a map compression approach to ensure the success rate of visual localization while retaining the most effective information of the map. The localization process mainly integrates the proposed approaches, taking into account the efficiency and robustness of long-term scenarios.
关键词	视觉定位视觉位置识别位姿估计深度学习
学科领域	计算机感知 ; 计算机神经网络
学科门类	工学::计算机科学与技术（可授工学、理学学位）
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/39237
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	辛喆. 基于时空不变性的机器人视觉定位方法研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2020.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
辛喆-基于时空不变性的机器人视觉定位方法（19273KB）	学位论文		限制开放	CC BY-NC-SA