Knowledge Commons of Institute of Automation,CAS
基于 C/S 架构的室内复杂场景视觉定位系统研究 | ||
王超![]() | ||
2020-05 | ||
页数 | 82 | |
学位类型 | 硕士 | |
中文摘要 |
| |
英文摘要 | With the increasing popularity of mobile robots, the demand for indoor localization of robots is growing as well. People are in the hope that mobile robots can achieve accurate localization and navigation indoors, and then perform a series of tasks. Indoor localization has a wide range of applications in factories, homes, shopping malls, etc. For the sake of achieving indoor accurate localization, the visual localization technology is a feasible solution, especially to estimate the camera's six-degree-of-freedom position and orientation in the world coordinate system through a single image shot by the camera, which is easy to operate. However, due to the complex indoor environmental changes, a single image contains less information in comparison with a video. How to improve the accuracy, robustness and efficiency of the localization system based on a single image is still a challenging work. In this paper, a series of researches on the visual location problems of the indoor complex scene based on a single image are carried out. The main contributions are as follows: 1.A medium-sized indoor complex scene test dataset is constructed. In view of the problems that the size of the existing public dataset is too small and the test sample has no ground truth value, this paper uses the laser-vision SLAM scanning equipment Navvis to scan the total area of 8000 square meters of three floors of Wanda Plaza in Shijingshan, Beijing, so as to obtain the 1567 mixed panoramic pictures and the depth images. In the meantime, the full-stop electronic rangefinder is used to assist the localization in improving the accuracy of the 3D point cloud model. In this paper, 36 perspective views of different perspectives are generated from each panorama by the method of perspective synthesis. The three-dimensional point coordinates corresponding to each pixel of the perspective view are generated by the three-dimensional point cloud model, thus a three-dimensional scene database containing 56412 images is constructed. In addition, 4701 images of the scene are collected by three different types of mobile phones as the test dataset. The internal parameters of the mobile phone are individually calibrated by the two-dimensional checkerboard method. The external parameters of the mobile phone and Navvis device are estimated by EPnP in the test dataset image, so as to obtain the ground truth pose of the test image. 2.A set of visual localization flow based on the image retrieval is designed. In view of the deficiencies of low efficiency and low accuracy of the current visual localization method, this system first compares three mainstream image retrieval algorithms of “BOW”,“Disloc”,and “Inloc”, compare their advantages and disadvantages, and selected BOW model as the final image retrieval scheme. In order to further improve the efficiency of localization, GPU is introduced to accelerate the extraction of the features of RootSIFT on the basis of BOW similarity retrieval. In accordance with the results of RootSIFT feature matching, the retrieval results are reordered. The top-10 most similar images and their corresponding feature matching results are input into the Perspective-N-Point localization algorithm to estimate the query image pose. The experiment results show that the efficiency of this method is more than 10 times higher than that of the traditional algorithm, with the characteristics of high precision, high efficiency and strong robustness. 3.A complete set of indoor vision localization system based on Client-Server architecture is realized. The localization time of this system is 200ms, the localization accuracy is 6cm, the angle accuracy is 0.32 °, and the localization success rate is 91.6%, which meets the needs of practical application. The system includes the following modules: the client module, which is responsible for completing the task of the user's taking the scene image, compressing and uploading it to the server, and receiving the returned localization results; the server module, which is responsible for loading the 3D scene model reconstructed offline, carrying out the similarity retrieval and the pose estimation for the received client images, and sending the estimated pose to the client. | |
关键词 | 数据采集,图像检索,相机位姿估计,C/S 架构,室内视觉定位 | |
语种 | 中文 | |
七大方向——子方向分类 | 三维视觉 | |
文献类型 | 学位论文 | |
条目标识符 | http://ir.ia.ac.cn/handle/173211/39255 | |
专题 | 多模态人工智能系统全国重点实验室_机器人视觉 | |
推荐引用方式 GB/T 7714 | 王超. 基于 C/S 架构的室内复杂场景视觉定位系统研究[D]. 中国科学院自动化研究所. 中国科学院大学,2020. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | ||
基于CS架构的室内复杂场景视觉定位系统研(14603KB) | 学位论文 | 开放获取 | CC BY-NC-SA |
个性服务 |
推荐该条目 |
保存到收藏夹 |
查看访问统计 |
导出为Endnote文件 |
谷歌学术 |
谷歌学术中相似的文章 |
[王超]的文章 |
百度学术 |
百度学术中相似的文章 |
[王超]的文章 |
必应学术 |
必应学术中相似的文章 |
[王超]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论