CASIA OpenIR  > 模式识别国家重点实验室  > 机器人视觉
大规模场景的三维几何和语义建模研究
周洋
2019-05-22
页数142
学位类型博士
中文摘要

随着自主平台系统在各个领域的广泛应用,人们逐渐对其能力和适应性提出了更高的要求。具有对环境的感知和交互能力是任何自主平台系统的基本特性,而对环境的三维几何和语义建模是环境感知和交互的主要组成部分。本文围绕大规模场景的几何和语义建模这两项内容进行了系统研究,主要工作和贡献包括以下四个方面:

1. 针对大规模场景重建中使用天空和地面图像分别生成的点云模型存在不配准的问题,提出了一种基于场景网格的天空和地面点云模型配准方法。该方法首先从各自点云中重建场景表面网格模型,去除噪声干扰。然后以天空网格模型为参考模型,并在参考模型上选取邻域平坦、观察较好的面片。通过这些特殊面片上的对应关系,可以求得两个模型之间的相似变换,从而逐步将地面模型对齐到天空模型上。对于变换过程中可能出现的因支撑匹配点非均匀分布所造成的错误停止现象,本文的方法使用了一种能量最小化的方法来逐步消除两个模型之间的位移。实验表明,本文中的方法能适用于大规模场景的点云配准。

2. 针对大规模场景网格表面重建中一般难以保持场景中的细节结构的问题,提出了一种新的在保持大尺度物体的同时又可以保持场景中的细节结构的方法。为了达到这个目的,本文提出了一种新的可视信息模型,在滤除噪声的同时可以保持细小物体。另外通过在四面体二分类问题中的能量模型中引入一个似然项,本文的方法的抗噪能力得到了进一步提高。为了更充分地利用可视信息,本文的方法直接以原始深度图对应的点云为输入,并调整了新提出的可视信息模型以利用未参与三角化的点的可视信息。实验表明,本文提出的方法能在重建大场景的同时保存场景细节信息。

3. 针对特殊场景如中国古建筑的三维模型的语义分割问题中存在的标注数据缺乏的问题,提出了一种基于主动学习的方法来实现中国古建筑网格模型的语义分割。该方法以少量标注图像微调图像分割网络开始,在临时语义图像融合得到的语义三维模型的监督下,逐步迭代地挑选少量未标注图像来进一步微调图像分割网络。通常,这个过程只需少数几次迭代就可以得到较好的结果。在这里,最关键的问题是如何挑选图像给人工标注。为了挑选合适的图像和平衡分割质量与计算效率,本文提出了两个度量:观测不确定度和观测离散度。据目前所知,本文中的方法是第一个提出三维模型语义分割的主动学习框架的方法,实验结果也验证了本文的方法的有效性。

4. 针对数据不平衡场景如街景的三维点云模型的语义分割问题,提出了一种基于主动学习的街景点云模型语义分割方法。为了方便获取图像和三维模型之间的对应关系以及有效利用三维模型的几何特征,该方法首先将点云体素化得到三维模型的体素表示,并将点云法向的均值赋予体素用于衡量邻域约束。在前一章提出的主动学习框架下,该方法在微调图像分割网络的时候使用了自适应数据过采样方法,并在挑选查询图像的时候使用加权的图像挑选策略,来解决街景场景中普遍存在的数据不平衡问题。实验结果表明该方法可以有效地对街景点云模型进行语义分割。

英文摘要

With the recent popularization of autonomous platform systems in various fields, the demand on their capability and manoeuvrability is in constant increase. As perception and interaction with environment are two salient features of any practical autonomous platform systems, this thesis focuses on how to effectively and efficiently model large scale 3D scenes both geometrically and semantically, the two fundamental issues on scene perception and interaction. The main works and contributions are four-fold:

1. To tackle the misalignment problem of the point clouds generated separately from aerial images and ground images in large-scale scenes, a mesh-based approach is proposed. At first, surface meshes are extracted from the two point clouds to filter out noise. Then, some suitable facets from the aerial mesh are selected under the condition that the facets should be within a smooth local area and well viewed by the reference cameras. After that, the similarity transformation between the two models is computed from the correspondences established on the selected facets, and two models are gradually aligned. To deal with the potential false termination phenomenon during the alignment process due to the unevenly distributed supportive points, an energy minimization step is engaged to gradually reduce the gap between the two models. Experimental result shows that the proposed method performs well for large scene aerial-ground point clouds realignment.

2. To preserve the scene details in large scene surface reconstruction, a novel method is proposed. To this end, a new visibility model is introduced for noise reduction and detail preservation. Then a new likelihood term is added to the total energy in the binary labeling process to further improve the ability of the proposed method to filter out noise. To fully exploit the visibility information, the original 3D points recovered with depth maps are directly used as the input, and the visibility information of those points which are not used for tetrahedralization is also utilized by the proposed visibility model with some adjustments. Experimental result shows that the proposed method could well balance the seemingly contradictory demands of automatically modeling large scenes and simultaneously preserving scene fine details.

3. To alleviate the problem of usually lacking sufficient annotated data in semantic labeling, an active learning based method is proposed for semantic labeling of the mesh models of ancient Chinese architectures. The proposed method starts to fine-tune a conventional semantic segmentation network with a few annotated images, then based on the tentative segmentation results on the 3D model, a small number of suitable images are automatically selected for human annotation to further fine-tune the network. This iterative process could terminate in a few loops before satisfactory results are reached. Here the key is how to select appropriate new images for human annotation. To this end, two measures are introduced, namely observation uncertainty and observation divergence, to balance the segmentation quality and computational efficiency. To the best of our knowledge, it seems the proposed method is the first to give an active learning framework for semantic segmentation of 3D models. The experimental results also validate the proposed active-learning based method.

4. To tackle the data-imbalance problem in semantic labeling, an active learning based method is proposed for semantic labeling of the point clouds of street view scenes, where data imbalance is severe. To establish correspondences between images and the 3D model and use the geometric information of the 3D model, a voxel grid over the point cloud is created and the 3D model is represented by a set of voxels, where the averages of the normal of the points within voxels are used as their adjacency measurement. Under the above active-learning based framework, an adaptive over-sampling technique is applied in the fine-tuning process and a weighted query image selection criterion is used to deal with the data imbalance problem. The experimental results show that the proposed method could effectively label the point clouds of street view scenes.

关键词计算机视觉 三维重建 点云配准 语义分割 主动学习
语种中文
七大方向——子方向分类三维视觉
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/23774
专题模式识别国家重点实验室_机器人视觉
推荐引用方式
GB/T 7714
周洋. 大规模场景的三维几何和语义建模研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2019.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
大规模场景的三维几何和语义建模研究.pd(19108KB)学位论文 开放获取CC BY-NC-SA
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[周洋]的文章
百度学术
百度学术中相似的文章
[周洋]的文章
必应学术
必应学术中相似的文章
[周洋]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。