基于主动学习的大规模复杂场景三维语义分割 | |
荣梦琪 | |
2023-08-21 | |
页数 | 144 |
学位类型 | 博士 |
中文摘要 | 在计算机视觉、摄影测量、无人系统等研究领域,随着图像三维重建技术的快速发展,以及激光传感设备的应用普及,场景三维数据的获取变得日益便捷,使得计算机对周围环境的感知由二维感知开始向三维感知转变。 1. 提出了一种基于原始影像的主动学习三维语义分割方法。 2. 提出了一种基于渲染影像的主动学习三维语义分割方法。 3. 提出了一种基于正射影像的主动学习三维语义分割方法。 4. 城市级大规模场景高效三维语义分割实践。 |
英文摘要 | In the fields of computer vision, photogrammetry, and unmanned systems, with the rapid development of image-based 3D reconstruction techniques and the popularization of laser sensing devices, the acquisition of 3D scene data has become increasingly convenient, enabling computers to perceive their surrounding environment in 3D, moving beyond traditional 2D. In this context, 3D semantic segmentation has emerged as a fundamental and important task, aiming to accurately classify objects in 3D space into different semantic categories. While deep learning has made significant progress in image semantic segmentation tasks in recent years, there are still many challenges when it comes to complex large-scale 3D scenes. Firstly, annotating 3D data is labor-intensive and costly, resulting in a scarcity of large-scale 3D semantic segmentation datasets available for supervised training. Secondly, in large-scale 3D scenes, there are typically numerous object categories that are widely distributed, making it difficult to develop a general 3D segmentation model that can adapt to various types of scenes. Moreover, when fine-tuning pre-trained models to adjust to specific 3D scenes, the selection strategy for fine-tuning samples is often more complex compared to 2D tasks. To address these issues, this dissertation adopts two key ideas. The first is to construct the corresponding relationship between the 3D models and multi-view 2D images through original images, rendered images, and orthographic images. By employing a two-step strategy of first performing 2D segmentation and then integrating the results into 3D, the dissertation achieves the capability of 3D segmentation for large-scale complex scenes. Secondly, the dissertation introduces the idea of active learning, automatically selecting challenging 2D samples based on metrics such as 3D segmentation uncertainty and feature diversity, thereby achieving the adaptation of segmentation models across different domains and scenes, even with limited annotated data. Specifically, the main contributions and innovations of this dissertation are summarized as follows:
1. Proposed an active learning-based 3D semantic segmentation method using original images. Considering the strict correspondence between point clouds and pixels in large-scale 3D scenes based on image reconstruction, this dissertation first performs semantic segmentation on the images and then projects the segmentation results onto the 3D models for fusion. Meanwhile, during the fusion process, neighborhood semantic consistency constraints are applied to improve the robustness of global 3D fusion. Subsequently, the fused 3D segmentation results are measured for observation uncertainty and observation disparity, followed by the application of an active learning strategy to automatically select a limited number of challenging image samples for annotation, thereby fine-tuning the image semantic segmentation network. The experimental results on three outdoor large-scale 3D scenes acquired through different acquisition methods demonstrate that this method achieves accurate 3D semantic segmentation of large-scale 3D scenes with minimal image annotation requirements.
2. Proposed an active learning-based 3D semantic segmentation method using rendered images.In 3D models based on image reconstruction, inconsistencies between the semantic labels of the 3D model and the original images may arise due to factors such as lighting variations, dynamic object interference, and inaccurate camera pose estimation. These inconsistencies result in errors during global 3D fusion in the aforementioned method. To address this issue, this dissertation proposes a method based on rendered images, which can select appropriate rendering techniques based on scene characteristics and generate virtual viewpoint images from any location. Additionally, to tackle the common problem of imbalanced small-class samples in 3D semantic segmentation, this dissertation further proposes two strategies, namely region complexity and category diversity, in addition to segmentation uncertainty measurement. These strategies enhance the data selection capability of the active learning process, achieving a more balanced selection of samples. The experimental results demonstrate that this method achieves outstanding segmentation performance in large-scale aerial urban scenes and complex indoor scenes, particularly improving the segmentation accuracy of small-class objects and exhibiting the ability to discover unknown classes.
3. Proposed an active learning-based 3D semantic segmentation method using orthographic images. Although multi-view images can capture 3D scene information comprehensively, redundant images can impose a significant computational burden during the semantic fusion and active learning stages. To address this issue, this dissertation proposes a method based on orthographic images, which can effectively represent the global scene with fewer image data. Additionally, high-resolution images do not require precise annotation for all pixels during the active learning process. Therefore, this dissertation introduces an adaptive connected region computation method, which selects irregular pixel regions with lower segmentation quality for annotation, further reducing the scale of annotated data. The experimental results demonstrate that this method significantly improves the efficiency of large-scale 3D scene semantic segmentation and outperforms the method based on multi-view images in terms of accuracy.
4. Efficient 3D semantic segmentation practice in urban-scale large scenes. Applying theoretical methods and key technologies to real-world production and daily life, and thereby addressing crucial issues in actual scenes, holds significant importance and value. This dissertation presents a real case study in Zhengzhou, Henan, to validate the feasibility and practicality of the proposed theoretical methods in large-scale urban real-world 3D models. The experimental results demonstrate the ability to achieve fast and accurate semantic segmentation in large-scale real-world scenes, even with a small number of annotated images. Furthermore, the building information obtained through semantic segmentation can effectively support subsequent tasks such as vectorization modeling, providing valuable support for the construction of 3D digital models and the development of geographic information systems. |
关键词 | 大规模 复杂三维场景 三维语义分割 主动学习 |
语种 | 中文 |
七大方向——子方向分类 | 三维视觉 |
国重实验室规划方向分类 | 其他 |
是否有论文关联数据集需要存交 | 否 |
文献类型 | 学位论文 |
条目标识符 | http://ir.ia.ac.cn/handle/173211/52390 |
专题 | 毕业生_博士学位论文 |
推荐引用方式 GB/T 7714 | 荣梦琪. 基于主动学习的大规模复杂场景三维语义分割[D],2023. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | ||
毕业论文-答辩后修改-签名版.pdf(22974KB) | 学位论文 | 限制开放 | CC BY-NC-SA |
个性服务 |
推荐该条目 |
保存到收藏夹 |
查看访问统计 |
导出为Endnote文件 |
谷歌学术 |
谷歌学术中相似的文章 |
[荣梦琪]的文章 |
百度学术 |
百度学术中相似的文章 |
[荣梦琪]的文章 |
必应学术 |
必应学术中相似的文章 |
[荣梦琪]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论