边缘信息辅助的图像分割方法研究

CASIA OpenIR > 毕业生 > 硕士学位论文

	边缘信息辅助的图像分割方法研究
	何昊
	2022-05-25
页数	144
学位类型	硕士
中文摘要	图像分割是计算机视觉中一个重要的研究领域，相关算法在学术界中可以帮助研究目标跟踪、视觉问答等下游任务，在工业界中可以被应用到自动驾驶、机器人导航等多个场景中。近些年来，随着图像数据的增多、计算机算力的增强、深度神经网络的兴起，多种基于深度学习的图像分割模型被设计出来，这些模型在一些常用的数据集中取得了比较高的准确性。但是，图像分割领域仍然存在着一些难以解决的问题，比如图像中物体的尺度变化大、前景与背景的相似度高、前景信息与背景信息不均衡、物体密集或者相互之间存在遮挡的复杂场景等。面对这些问题，如何进一步提升图像分割模型的性能、效率和鲁棒性亟待深入研究。本文的工作采用边缘信息辅助的深度神经网络方法，围绕上述问题进行了图像分割算法的研究。主要的研究内容与贡献为： 1. 经过对现有的使用边缘辅助语义分割模型的分析，本文认为这些模型在面对前景与背景相似度高的图像时，存在两个缺陷：（1）无法产生有强判别性的边缘特征和准确的边缘预测结果；（2）不准确的边缘特征和边缘预测结果无法有效地引导分割分支输出准确的掩膜。针对这两个问题，本文一方面提出了差分边缘检测模块RDM采用同时优化边缘部分和非边缘部分的方式产生更精确的边缘预测结果，另一方面设计了一个基于点的图卷积神经网络模块PGM，PGM基于RDM产生的精确的边缘改善分割特征以输出准确的掩膜预测结果。本文在透明物体分割数据集上验证了RDM和PGM的有效性，在通用语义分割数据集上验证了模型的泛化能力。 2. 针对前景信息与背景信息不平衡的问题，本文认为现有的相关语义分割方法无法在突出前景信息的同时有效地建模大范围的信息依赖。为此，本文提出了以稀疏点的形式传递语义信息的模块PFM。PFM首先在backbone产生的两张相邻的特征图上采样一些稀疏的点，然后把这些点对应的上层特征图中的高级语义信息传递到下层的高分辨率特征图上。在不同特征之间使用多个PFM使得高级语义信息逐层向下流动，最终得到一张高分辨率并且具有高级语义信息的特征图，有助于准确的语义分割。因为绝大部分采样点位于前景物体及其边缘上，所以PFM在突出前景信息的同时可以使用稀疏的注意力机制建模大范围的信息依赖而不会给前景物体引入过多的背景噪声。本文在遥感图像语义分割数据集上验证了PFM的有效性，并且在通用的语义分割数据集上验证了其泛化能力。 3. 物体密集或者相互之间存在遮挡的复杂场景广泛地存在于多种图像分割任务中。现有的部分图像分割方法首先检测物体的边缘来帮助精确地定位物体，然后进行准确的物体分割。但是这些方法在面对上述复杂场景时效果并不理想，其主要原因是：在这些场景中，直接检测出物体的边缘并不容易。为此，本文中采用分而治之的思想把边缘检测分成了两个更简单的子任务：从物体的内部和外部分别向边缘扩张或收缩，两个子任务输出结果的交汇处就是物体的边缘。文中提出了一个边缘挤压模块BSM来完成这两个子任务。因为实例分割是主要面向物体的分割，所以BSM可以很好地运用在实例分割任务上；此外，如果把语义分割中的每一个类别看成一个物体，类别与类别之间的分界也是边缘部分，所以BSM同样可以运用在语义分割任务上。本文在实例分割和语义分割的多个数据集上验证了模型的有效性和泛化能力。
英文摘要	Image segmentation, whose output can be widely applied to other downstream research areas (such as object tracking, and visual question answering) and industries (such as robot navigation, and automatic pilot), is an important research area in the computer vision community. In recent years, with the increase of images, the enhancement of computing resources, and the booming of deep learning, many deep learning-based image segmentation algorithms have been proposed. These models have achieved high accuracy and robustness in some common datasets. However, when facing difficult situations, such as the large scale variance of objects, the highly similar appearance information between foreground and background, complicated scenes, instance overlapping, the quality of the predicted mask is unsatisfactory. Under such circumstances, further improving the performance, efficiency, and robustness of the image segmentation model is an urgent study. This paper mainly adopts deep neural networks to deal with some of the problems mentioned above with the assistance of edge information. To summarize, this paper's contributions have the following aspect: 1. After studying some of the existing edge-assisted semantic segmentation models, this paper considers these models have two defects when facing the image with the high similarity between foreground and background. First, they cannot generate discriminative edge features and accurate edge prediction. Second, the undiscriminating edge feature may lead to false guidance to the segmentation feature, which is harmful to mask prediction. Thus, on the one hand, this paper proposes a refined differential module RDM to optimize the edge part and non-edge part at the same, which is contribute to accurate edge prediction, on the other hand, this paper proposes a point-based graph convolutional network module PGM refine the mask feature with the accurate edge prediction. This paper proves the effectiveness of RDM and PGM on glass-like object segmentation task, and further verifies their generalization on general semantic segmentation task. 2. Aiming at the problem of the imbalance between foreground information and background information, this paper argues that the existing relevant semantic segmentation methods cannot effectively model long-scale information dependencies while highlighting foreground information. Therefore, this paper proposes a module point flow module PFM that transmits semantic information in the form of sparse points. Firstly, PFM samples some sparse points from two adjacent feature maps produced by the backbone and then transfers the high-level semantic information of the corresponding upper feature map to the lower high-resolution feature map. Using multiple PFMs between different features makes the high-level semantic information flow down layer by layer, and finally, a feature map with high resolution and high-level semantic information is obtained, which is conducive to accurate semantic segmentation. Since most of the sampling points are located on foreground objects and their edges, PFM can use a sparse attention mechanism to model the long-range information dependencies while highlighting foreground information without introducing too much background noise to foreground objects. In this paper, the effectiveness of PFM is verified on remote sensing image semantic segmentation datasets, and its generalization ability is verified on general semantic segmentation datasets. 3. Complex scenes with dense objects or instance occlusion widely exist in many image segmentation tasks. Some existing image segmentation methods first detect the edge of the object to help locate the object accurately and then perform accurate object segmentation based on this edge. However, these methods do not work well in these complex scenes, mainly because it is not easy to directly detect the edges of objects in these complex scenes. Therefore, in this paper, the idea of divide and conquer is adopted to divide edge detection into two simpler sub-tasks: expanding or contracting to the edge from the inside and outside of the object respectively, and the intersection of the output results of the two sub-tasks is the edge of the object. In this paper, a boundary squeeze module BSM is proposed to accomplish the two sub-tasks. Because instance segmentation is mainly object-oriented segmentation, BSM can be applied to instance segmentation task well. In addition, if each category in semantic segmentation is regarded as an object, the boundary between categories is also an edge part, so BSM can also be applied to semantic segmentation task. In this paper, the validity and generalization ability of the model is verified on several datasets of instance segmentation and semantic segmentation.
关键词	语义分割实例分割边缘检测
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/48654
专题	毕业生_硕士学位论文
推荐引用方式 GB/T 7714	何昊. 边缘信息辅助的图像分割方法研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2022.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
边缘信息辅助的图像分割方法研究.pdf（63434KB）	学位论文		限制开放	CC BY-NC-SA