The localization ability of service robots is the premise of autonomous motion. With the characteristics of wide application range, rich information, and low cost of visual sensors, visual localization has received much attention. Using deep networks to represent visual image information can guarantee the robust localization of service robots in viewpoint variation, illumination change and complex environments. It is significant in both research and applications. Considering two situations of unknown environment and known map, this thesis conducts the research on robot visual localization based on deep features. The main contents are as follows:
Firstly, the research background and its significance of this thesis are given. The research development of visual SLAM in unknown environments and visual localization based on known map is reviewed. The content and structure of this thesis are introduced.
Secondly, aiming at the problem that the hand-crafted feature point of the traditional visual SLAM methods in unknown environments is sensitive to environmental change, a visual simultaneous localization and mapping method based on point feature and object-level semantic feature termed as PO-SLAM is proposed. It utilizes convolutional neural network to extract object-level semantic features from the image, and then a fast geometric segmentation algorithm is applied to differentiate objects and background with the help of depth image, which facilitates point-object feature association within the same frame. Furthermore, the point-point feature association and object-object feature association between frames are constructed. The point-object constraint is imposed on the optimization process to make the semantic information of matched points be the same, which is beneficial to data association. Also, the relative position invariance constraint between objects is exerted to improve the accuracy of localization. In addition, the feature points associated with dynamic objects are removed according to the point-object association results, and the robustness of visual localization to dynamic environments is improved. Experiments on the dataset and real scenario demonstrate the effectiveness of the proposed method.
Thirdly, a scene coordinate regression network SFT-CR based on spatial feature transformation is proposed, which realizes camera relocation by implicit scene representation. On one hand, to solve the problem that the standard convolution operation lacks intrinsic invariance to the image geometric transformation caused by the viewpoint change, a spatial feature transformation network is designed to explicitly transform the convolution features, which effectively improves the robustness of the deep features to geometric transformation. As a result, the accuracy of coordinate estimation is improved. On the other hand, a loss function based on maximum likelihood is constructed and the uncertainty of coordinate estimation is introduced. The scene coordinate regression network provides not only 3D coordinates corresponding to the 2D pixels in the image but also the uncertainty of each coordinate. Based on the coordinate uncertainty, the obtained 2D-3D correspondences are screened for the 6D pose estimation of the camera by the PnP algorithm, which improves the accuracy and efficiency of localization. Besides, the CoordConv operation is introduced in the feature extraction to enhance the feature discrimination in weak texture areas. The effectiveness of the proposed method is verified by relocalization experiments on the datasets.
Fourthly, a multi-task learning-based visual place recognition method labeled as MTA is proposed. The training based on the triplet ranking task in the existing methods ignores the compactness of the global features from the images corresponding to adjacent positions, which leads to the problem of weak generalization. Aiming at this problem, a new binary classification task is introduced, where all the query-positive pairs are regarded as the positive class and all query-negative pairs correspond to the negative class. A binary classification loss is designed to constraint the feature distances of all the positive pairs less than those of all the negative pairs. The global feature extraction network is trained by combining the binary classification task and the existing triplet ranking task, which enhances the intra-place global feature compactness and inter-place feature separability. Therefore, the generalization of the model is improved. Moreover, an attention module is proposed and embedded into the global feature extraction network, which makes the network pay more attention to the regions useful for place recognition during feature aggregation, which increases the discrimination of the global image feature. The proposed method is experimentally validated on public datasets and actual environment.
Fifthly, a localization software architecture for service robots based on the offline hybrid map is proposed, which integrates the proposed PO-SLAM with the relocation methods SFT-CR and MTA under the ROS framework. According to the size of the environment and the task requirement, a hybrid map combining explicit and implicit ones is constructed based on the adaptability of MTA method to large-scale environment and the relocalization accuracy of SFT-CR method. The explicit map contains the local feature points, 3D map points, and poses of key frames from PO-SLAM as well as the global features of key frames extracted by MTA. The implicit map is established by training the scene coordinate regression network for the important area that requires high localization accuracy. When the tracking process fails in PO-SLAM, the robot restores the pose utilizing MTA or SFT-CR method. The reliable and stable localization is then achieved, which is verified by navigation experiments in an indoor office environment.
Finally, the conclusions are given and future work is addressed.
Key Words: Visual localization, Semantic SLAM, Relocalization, Scene coordinate regression network, Visual place recognition, Offline hybrid map