CASIA OpenIR  > 毕业生  > 博士学位论文
基于信息融合的遥感图像语义分割方法研究
曹勇
2024-05-19
Pages112
Subtype博士
Abstract

遥感图像语义分割旨在利用遥感数据对地面物体进行精确的像素级分类。该任务是遥感图像处理领域中一个长期的研究热点,在环境监测、城市规划、灾害响应和数字城市建设等领域发挥着重要作用。近年来,深度学习技术的引入极大地推动了遥感图像语义分割领域的发展。然而遥感图像相较于自然图像具有其独特性:(1)遥感图像包含更丰富的光谱信息,除了传统的红绿蓝波段外,通常还包含红外、近红外等波段;(2)遥感图像具有更广泛的上下文信息,其尺寸通常比自然图像大得多,往往需要裁剪成图像块后才可以进一步处理;(3)遥感图像涉及更复杂的地物类型,同物异谱、异物同谱现象广泛存在于遥感图像处理中,不同模型对不同地物类型有着不同的处理能力。上述特性对遥感图像的语义分割提出了更高的要求,然而现有的方法未能充分利用遥感数据中蕴含的丰富信息。针对以上问题,本研究创新性地将信息融合的思想与遥感图像语义分割任务相结合,分别从数据层面、特征层面和模型层面三个方面进行了深入研究,面向遥感图像语义分割问题提出了多种有效的信息融合方法。本论文的主要贡献包含以下几个方面:

1.提出了一种基于多光谱数据融合的遥感图像语义分割方法。该方法的核心思想是将多光谱遥感图像数据拆分为多个三波段数据,分别送入特征提取网络进行特征提取。然后通过一个精心设计的金字塔网络融合不同数据源的特征,从而充分利用并有效融合遥感图像数据中丰富的光谱信息。具体而言,针对包含红外或近红外波段的四通道高分辨率遥感图像,论文提出了一种双流融合模型。该模型通过对波段进行重新分组,然后将其分别送入两支不同的特征提取网络提取特征,同时提出了一种创新的阶段金字塔池化模块对不同数据源不同尺度的特征进行融合。这一方法不仅能够使用在大规模自然图像上预训练的模型参数,同时也充分利用了遥感图像中的光谱信息。在多个数据集上进行的实验证明了所提方法的有效性。

2.提出了一种基于全局上下文特征融合的遥感图像语义分割方法。该方法的核心思想是通过一个轻量化的全局特征提取网络,提取未经裁剪的整幅遥感图像的全局上下文信息,并将其融合到待分割的小块遥感图像中,从而克服了模型只能在当前图像块提取上下文的局限性,极大扩展了模型的上下文信息提取范围。具体而言,针对大尺寸遥感图像数据,论文提出了一种全局特征融合模型,通过一个专门设计的轻量化分组Transformer结构,实现了对全局上下文信息的提取。同时,使用一个局部特征提取网络提取待分割的小块图像的特征。最后,通过一个独特的交叉特征融合模块将大尺寸遥感图像中的全局语义信息自适应地融合到待分割的局部图像块中。该方法在遥感图像语义分割中实现了真正的全局上下文信息提取,广泛而充分的实验证明了该模型的有效性。

3.提出了一种基于多模型协同融合的遥感图像语义分割方法。该方法的核心思想是通过集成多个深度学习模型,利用不同模型在不同地物类别分割上的优势来提升整体的分割性能。具体而言,针对当前现有的大量基于深度学习的语义分割模型,以及不同模型在遥感图像语义分割中对不同地物类别分割能力的差异,论文提出了一个头部层级模型融合框架。该框架通过让不同模型共用特征提取网络,只在分割头部分进行集成,从而有效降低了集成模型的复杂度。同时,通过引入方差—协方差分解,设计了一个协作损失函数,使不同模型之间能够尽可能地学习到它们之间的差异性,从而提升集成模型的整体性能。多个数据集上的结果表明,该方法的具有显著的有效性和泛化性。

Other Abstract

Remote sensing image semantic segmentation aims to perform precise pixel-level classification of ground objects using remote sensing data. This task has been a long-term research hotspot in the field of remote sensing image processing and plays a significant role in areas such as environmental monitoring, urban planning, disaster response, and digital city construction. In recent years, the introduction of deep learning technologies has greatly advanced the development of the remote sensing image semantic segmentation field. However, remote sensing images have unique characteristics compared to natural images: (1) Remote sensing images contain richer spectral information, in addition to the traditional red, green, and blue bands, they often include infrared and near-infrared bands; (2) Remote sensing images have more extensive contextual information, their size is usually much larger than that of natural images, often requiring cropping into smaller pieces for further processing; (3) Remote sensing images involve more complex types of ground objects, with phenomena such as same-object different-spectrum and different-objects same-spectrum widely present in remote sensing image processing, leading to different models having varying capabilities in handling different types of ground objects. These characteristics pose higher demands for remote sensing image semantic segmentation, and existing methods have not fully utilized the rich information contained in remote sensing data. Addressing these issues, this research innovatively incorporates the concept of information fusion into the task of remote sensing image semantic segmentation, conducting in-depth studies from three aspects: data level, feature level, and model level, and proposes a variety of effective information fusion methods for addressing the challenges of remote sensing image semantic segmentation. The main contributions of this dissertation include:

1. A remote sensing image semantic segmentation method based on multispectral data fusion has been proposed. The core idea is to split multi-band remote sensing image data into several three-band datasets, each of which is fed into a feature extraction network for feature extraction. Then, a carefully designed feature pyramid network is used to fuse features from different data sources, thus fully utilizing and effectively integrating the rich spectral information in remote sensing image data. Specifically, for four-channel high-resolution remote sensing image data containing infrared or near-infrared bands, a dual stream fusion model is proposed. This model reorganizes the bands, then inputs them into two different feature extraction networks, and introduces an innovative stage pyramid pooling module to fuse features of different scales from different data sources. This method not only leverages model parameters pre-trained on large-scale natural images but also makes full use of the spectral information in remote sensing images. Experiments conducted on multiple datasets have proven the effectiveness of the proposed method.

2. A remote sensing image semantic segmentation method based on global context feature fusion has been proposed. The core idea is to extract the global contextual information of the entire uncropped remote sensing image through a lightweight global feature extraction network and to integrate it into the to-be-segmented small patches of remote sensing images. This overcomes the limitation that models can only extract contextual information from the current patch, greatly expanding the range of context information extraction of the model. Specifically, for large-scale remote sensing image data, a global feature fusion model is proposed, implementing global context information extraction through a specially designed lightweight group Transformer structure. Simultaneously, a local feature extraction network is used to extract features from the to-be-segmented small patches. Finally, a unique cross feature fusion module adaptively integrates the global semantic information from the large scale remote sensing images into the to-be-segmented local  patches. This method achieves true global context information extraction in remote sensing image semantic segmentation, with extensive and sufficient experiments proving the model's effectiveness.

3. A remote sensing image semantic segmentation method based on model fusion is proposed. The core idea of this method is to enhance the overall segmentation performance by integrating multiple deep learning models, leveraging the strengths of different models. Specifically, in response to the variance in segmentation capabilities of different deep models for different types of ground objects in remote sensing image semantic segmentation, the dissertation introduces a deep model fusion framework. This framework uses a shared feature extraction network and integrates only at the segmentation head, effectively reducing the complexity of the model. Additionally, the dissertation introduces variance-covariance decomposition and designs a cooperative loss function, allowing different models to learn as much as possible about their differences, thereby enhancing the overall performance of the ensemble model. Extensive experiments have demonstrated the effectiveness and generalizability of this method.

Keyword遥感图像处理 语义分割 信息融合 深度学习
Language中文
Sub direction classification图像视频处理与分析
planning direction of the national heavy laboratory多尺度信息处理
Paper associated data
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/57390
Collection毕业生_博士学位论文
Recommended Citation
GB/T 7714
曹勇. 基于信息融合的遥感图像语义分割方法研究[D],2024.
Files in This Item:
File Name/Size DocType Version Access License
基于信息融合的遥感图像语义分割方法研究.(8052KB)学位论文 限制开放CC BY-NC-SA
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[曹勇]'s Articles
Baidu academic
Similar articles in Baidu academic
[曹勇]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[曹勇]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.