面向遥感场景的图像超分辨率算法研究

CASIA OpenIR > 毕业生 > 博士学位论文

	面向遥感场景的图像超分辨率算法研究
	安泰
	2023-12
页数	130
学位类型	博士
中文摘要	图像超分辨率旨在根据低分辨率图像预测对应的高分辨率图像。与目标检测、语义分割等计算机视觉任务不同，该任务具有高度的不适定性。因此，生成符合特定需求的高分辨率图像是图像超分辨率算法的主要目标。在遥感领域，这种专业化的需求更为显著，涵盖资源开发、灾害监测、土地测绘和生态保护等多个方面。与自然图像不同，遥感数据具有成像条件多样、目标尺度范围广、特征密度变化大和需求导向明确等特点，这使得遥感图像超分辨率成为了一个开放性的问题。近年来，面向遥感场景的图像超分辨率算法取得了显著的研究进展，为遥感数据的分析、解释和应用提供了重要支持。然而，这些方法通常面临着以下适用性问题：1）特征提取大多依赖手工设计，复杂度较高，限制了模型的扩展性和实用性；2）普遍存在过拟合现象，泛化能力较差，影响了预测场景的准确度和置信度；3）数据利用率不高，对复杂模式的学习不充分，降低了预测场景的可判别性和可解释性。为了解决上述问题，本文将遥感图像超分辨率分为面向重建精度与面向感知质量两类，并针对不同的场景分别提出了创新性的解决方案，以缩小研究方法与实际需求之间的差异。本文的主要研究内容与贡献归纳如下： 1. 提出一种基于全局特征融合的多图像超分辨率方法。该方法充分利用了采集自同一场景的多张低分辨率图像的互补信息，最大程度地降低了环境的影响，进而提高目标场景的重建精度。具体地，该方法引入了一个基于Transformer的融合模块，能够对任意数量图像的相同区域施加动态注意力，从而显著提高图像利用率。此外，通过可学习的嵌入向量高效地提取全局融合特征，进一步降低了模型的复杂度，并使其适用于时序相关性弱的采集图像序列。在PROBA-V Kelvin数据集上的实验证明，提出的方法在多个波段上都表现出了先进的抗噪能力和重建精度。 2. 提出一种基于增强表征的多倍率图像超分辨率方法。该方法充分利用了多倍率训练提供的不同尺度级别的目标信息，引导网络获取更丰富的特征表示，进而提高目标场景的重建精度。具体地，该方法引入了一个任意尺度上采样重建模块，其中包括基于连续表达的局部密集预测模块和基于离散表达的特定分辨率增强模块，用于将目标场景扩展并嵌入到指定分辨率。此外，通过引入一种自集成策略，进一步缓解了超高倍率重建面临的欠拟合问题。在多个遥感数据集上的实验证明，提出的方法在多个倍率下的重建精度都达到了先进水平，并且具有较好的细节复原能力。 3. 提出一种基于多尺度感知优化的图像超分辨率方法。该方法充分发挥了损失函数对于深度神经网络的驱动作用，从而提高复原场景的感知质量。具体地，该方法引入了一种基于多尺度相似度距离的感知损失，通过显式地监督图像级别与特征级别的多尺度块相似度，以指导网络适应不同尺度的视觉模式。此外，通过将基于归一化纹理复杂度的自适应注意力融入到该损失函数中，有助于增强网络对于复杂纹理的预测能力。在多个遥感数据集上的实验证明，提出的方法能够显著改善多种感知指标，提高复原场景的图像质量，同时不会对网络训练造成较大的负担。 4. 提出一种基于轻量级扩散模型的图像超分辨率方法。该方法充分发挥了扩散模型强大的图像生成能力，从而提高复原场景的感知质量。具体地，该方法引入了一种轻量化的扩散模型，其使用基于交互注意力的轻量级嵌入模块以高效地整合低分辨率条件信息，并采用无参数的像素重排方式构造潜在空间以降低推理复杂度。此外，通过使用一种加速采样方法，进一步提高了模型的推理效率。在多个遥感数据集上的实验证明，与现有扩散模型方案相比，提出的方法显著增强了图像的生成效率；与快速图像超分辨率方案相比，提出的方法实现了更好的重建精度与感知质量的平衡。
英文摘要	Image super-resolution aims to predict high-resolution images corresponding to low-resolution inputs. Unlike computer vision tasks such as object detection and semantic segmentation, this task is highly ill-posed. Therefore, the primary objective of image super-resolution algorithms is to generate high-resolution images that satisfy specific requirements. In remote sensing, this specialized need is more prominent, encompassing diverse areas such as resource development, disaster monitoring, land cover mapping, and ecological preservation. Unlike natural images, remote sensing data exhibit diverse imaging conditions, a wide range of target scales, significant variations in feature density, and explicit demand orientation. These characteristics render remote sensing image super-resolution an open-ended problem. In recent years, significant progress has been made in image super-resolution tailored for remote sensing scenarios, providing vital support for the analysis, interpretation, and application of remote sensing data. However, these methods often encounter applicability challenges: 1) Feature extraction heavily relies on manual design, resulting in high complexity that limits the scalability and practicality of the models; 2) Overfitting is widespread, leading to poor generalization, thereby affecting the accuracy and confidence of predicted scenes; 3) Low data utilization rates and inadequate learning of complex patterns reduce the discriminability and interpretability of predicted scenes. To address these challenges, this study categorizes remote sensing image super-resolution into two types based on reconstruction accuracy and perceptual quality. Innovative solutions are proposed for different scenarios to bridge the gap between research methods and practical needs. The main research contributions are summarized as follows: 1. A multi-image super-resolution approach based on global feature fusion is proposed. This method optimally exploits complementary information gathered from multiple low-resolution images captured within the same scene, minimizing the impact of environmental factors and thereby enhancing the reconstruction accuracy of the target scene. Specifically, we introduce a fusion module based on the Transformer architecture, enabling dynamic attention application to the same regions across any number of images. This significantly improves the effective use of image data. Furthermore, we utilize learnable embedding vectors for efficient extraction of global fused features, reducing the model's complexity and enabling its application to image sequences with weak temporal correlations. Experimental validation on the PROBA-V Kelvin dataset demonstrates the method's superior noise resilience and reconstruction accuracy. 2. A scale-arbitrary image super-resolution approach based on enhanced representation is proposed. This method efficiently utilizes the diverse scale levels of target information provided by multi-scale training, guiding the network to acquire richer feature representations and improving the reconstruction accuracy of the target scene. Specifically, we introduce an upsampling module that operates at arbitrary scales. The module consists of both a local dense prediction based on continuous representation and a resolution-specific refinement based on discrete representation, with the goal of expanding and embedding the target scene into the specified resolution. Furthermore, by implementing a self-ensemble strategy, we mitigate the underfitting issue faced in ultra-high magnification reconstruction. Experimental validation on several remote sensing datasets demonstrates that the proposed method achieves state-of-the-art reconstruction accuracy across multiple magnification levels while effectively recovering fine details. 3. An image super-resolution approach based on multi-scale perceptual optimization is proposed. This method effectively leverages the driving force of the loss function in deep neural networks, thus enhancing the perceptual quality of the reconstructed scenes. Specifically, we introduce a perceptual loss based on multi-scale similarity distance. The loss explicitly supervises the similarity of multi-scale patches at both image and feature levels, guiding the network to adapt to different visual patterns at various scales. Furthermore, integrating adaptive attention based on normalized texture complexity into the loss function helps enhance the network's predictive capability for complex textures. Experimental validation on several remote sensing datasets demonstrates that the proposed method significantly improves a broad range of perceptual metrics, thereby enhancing the image quality of reconstructed scenes without imposing a significant burden on network training. 4. An image super-resolution approach based on a lightweight diffusion model is proposed. This method effectively harnesses the potent image generation capabilities of the diffusion model to enhance the perceptual quality of the reconstructed scenes. On one hand, we introduce a lightweight diffusion model that incorporates multiple lightweight embedding modules employing cross-attention. These modules seamlessly integrate contextual information from low-resolution inputs. On the other hand, we utilize a parameter-free pixel rearrangement technique to construct a latent space, thereby reducing inference complexity. Furthermore, the integration of an accelerated sampling method further amplifies the efficiency of the inference process. Experimental validation on several remote sensing datasets demonstrates that the proposed method significantly enhances image generation efficiency compared to existing diffusion model approaches. In comparison to fast image super-resolution methods, our approach achieves an improved balance between reconstruction accuracy and perceptual quality.
关键词	遥感图像超分辨率深度学习特征融合注意力机制扩散模型
语种	中文
七大方向——子方向分类	图像视频处理与分析
国重实验室规划方向分类	多尺度信息处理
是否有论文关联数据集需要存交	否
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/54530
专题	毕业生_博士学位论文多模态人工智能系统全国重点实验室
推荐引用方式 GB/T 7714	安泰. 面向遥感场景的图像超分辨率算法研究[D],2023.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
面向遥感场景的图像超分辨率算法研究.pd（13189KB）	学位论文		限制开放	CC BY-NC-SA