面向复杂场景的多源遥感图像目标识别研究

CASIA OpenIR > 多模态人工智能系统全国重点实验室 > 先进时空数据分析与学习

	面向复杂场景的多源遥感图像目标识别研究
	张鑫
	2022-05-24
页数	122
学位类型	博士
中文摘要	遥感图像目标识别作为对地观测技术中研究热点之一，旨在对图像中感兴趣目标（如房屋、舰船、飞机、港口等）进行自动定位与分类。随着深度学习的发展，神经网络提取的深层特征有着更强的语义表征能力和判别性，使遥感图像目标识别性能获得了进一步的提升。然而，不同源遥感图像在时间分辨率、空间分辨率、光谱分辨率、目标特性等方面存在差异性。因此，单一图像源对复杂场景中的目标识别可能存在不确定性与不完整性，有可能产生识别误差。多源图像融合与联合决策，能够提高多源遥感图像的目标识别精度，能够有效地扩大遥感大数据的应用范围。在这种背景下，面向复杂场景的多源遥感图像目标识别具有重要的研究意义和应用价值。针对多源遥感图像目标识别难点和实际应用需求，本文采用“从单源多源”的研究路线、对以下三种典型的复杂场景开展研究工作：1）光学遥感图像中，目标在不同视角下呈现不同形态，类内距离甚至大于类间距离，不同视角的目标很难被准确识别。2）SAR图像中，纹理信息缺失、斑点噪声严重，影响了神经网络对图像低层特征、目标高层语义特征的学习。3）异源图像间成像机理不同，复杂场景下的多源图像间存在配准误差或局部形变，由此引起的语义偏差制约了基于多源图像融合的目标识别性能。为解决上述难点，本文研究基于深度学习的多源遥感图像目标识别方法，实现复杂场景下遥感图像典型目标“端到端”的高精度识别。本文具体研究内容如下： 1．针对光学遥感图像中多视角目标识别难的问题，本文分别提出了对视角鲁棒的静态与动态目标识别方法。具体地，为提高多视角下的静态目标识别性能，本文提出了一种对视角鲁棒的视角敏感性卷积神经网络，在视角特征空间实现视角属性与类别属性的解耦，目标在视角敏感性特征空间的可分性得到大幅度改善；为提高多视角下的动态目标识别性能，本文提出了一种对视角鲁棒的、无需图像配准的动态目标识别框架，该框架利用感兴趣目标位置、目标的拓扑关系及旋转不变的点集匹配有效解决了动态目标在多时相、多视角图像中特征可分性低的难点。在多视角光学遥感数据集及通用多视角目标检测数据集中的实验验证了所提出的静态目标识别方法、动态目标识别方法的有效性。 2. 针对SAR图像中纹理信息缺失、斑点噪声严重引起的特征学习难得问题，本文提出了一种基于多任务学习的目标识别方法。该方法引入边缘形状特征学习、语义纹理特征学习两个辅助任务更好地学习图像的底层特征，采用多任务特征融合模块与多尺度特征对齐模块将辅助任务与主任务的多模态特征进行融合，通过新的区域级监督学习、可微分的多任务损失权重学习克服了正负样本不均衡、多任务间学习不平衡的问题。所提出方法在两个公开SAR图像数据集中相较于基准方法取得了最优的识别性能。 3. 针对多源遥感图像间语义偏差大、特征融合难的问题，本文提出了一种基于Transformer 网络的编码器与解码器结构的多源图像融合和目标识别和方法。编码器对多源图像进行特征提取，解码器对多源图像特征进行融合；通过任务驱动的特征提取与预测网络，图像级分类、像素级分类以及语义分割等多个遥感图像目标识别任务可以在同一框架下统一实现。在三个不同识别任务的公开数据集中的实验验证了所提出框架的有效性。
英文摘要	As one of the research hotspots in earth observation technology, remote sensing image object recognition aims to automatically locate and classify the objects of interest in the images (such as houses, ships, aircraft, ports, etc.). With the development of deep learning, deep features learned by neural networks have stronger semantic representation ability and discrimination, and the remote sensing object recognition performance has been significantly improved. However, there exist important differences between remote sensing images from different sources with respect to spatial resolution, temporal resolution, spectral resolution, and object characteristics. Therefore, recognition performances will be impacted by the uncertainty and incompleteness by a single image source, especially for complex scenes. In contrast, multi-source image fusion and joint decision-making are helpful for improving the recognition performance and enlarging the application scope of remote sensing big data. In this context, multi-source remote sensing image object recognition for complex scenes has important research significance and application value. In view of the difficulties and practical application requirements of object recognition in multi-source remote sensing images, this dissertation adopts the research route of "from single source to multi-source", and carries out research work on the following three typical complex scenes: 1) In optical remote sensing images, the object presents different appearances under different viewing angles, and the intra-class distance is even greater than the inter-class distance, and objects from different viewing angles are thus difficult to be accurately recognized. 2) In SAR images, texture information is missing and speckle noise is serious, which affects neural network in learning low-level features of images and high-level semantic features of objects. 3) Imaging mechanisms between different sensors are different, and there are registration errors or local deformation between multi-source images in complex scenes. The resulting semantic deviation restricts the performance of object recognition based on multi-source image fusion. To address the above difficulties, this dissertation studies the object recognition method of multi-source remote sensing images based on deep learning, and it aims to realize the "end-to-end" high-precision recognition of typical targets in remote sensing images in complex scenes. The specific research contents of this dissertation are listed as follows: Aiming at the difficulty of multi-view object recognition in optical remote sensing images, the static and dynamic object recognition methods robust to view-angles are proposed for optical remote sensing images. Specifically, for static object recognition, a perspective-sensitive convolution (PSC) neural network is presented. Perspective attribute and semantic attribute are decoupled in the perspective feature space, and the separability of the object in the perspective sensitivity feature space has been greatly improved. A dynamic object recognition framework is proposed, which is robust to view and does not need image registration. By using the location of the object of interest, the topological relationship of the object and the rotation invariant point set matching, the difficulty of low feature separability of dynamic targets in multi-temporal and multi-view images is effectively solved by the proposed framework. Experiments on the multi-view optical remote sensing data set and general multi-view object detection data set demonstrate the advantages of the proposed dynamic object recognition methods. To address the difficulty of feature learning caused by the lack of texture information and serious speckle noise in SAR images, a multitask-learning-based object recognition method is proposed. Two auxiliary tasks (e.g., edge-shape feature learning task and semantic-texture feature learning task) are introduced to better learn low-level features of the objects, multi-task feature fusion and multi-scale feature alignment modules are introduced to merge the multi-modal features of auxiliary and main tasks. Region-level supervised learning and differentiable multi-task loss weight learning are proposed to overcome the imbalance between positive and negative samples and the imbalance between tasks, respectively. The proposed method outperforms baseline methods on two public SAR image datasets, and experiments demonstrate the effectiveness of the proposed approach. To address the problems of large semantic deviation and difficulty of feature fusion between multi-source remote sensing images, a novel method is proposed for multi-source image fusion and object recognition is proposed, which is based on Transformer and encoder-decoder structure. The encoder performs feature extraction, and the decoder merges the multi-source image features. By task-specific feature extraction and prediction networks, multiple remote sensing image object recognition tasks (such as image-level classification, pixel-level classification and semantic segmentation) can be implemented within the unified framework. Experiments of three different recognition tasks on public datasets validate the effectiveness of the proposed framework.
关键词	目标识别深度学习多源遥感图像卷积神经网络 Transformer 网络
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/48493
专题	多模态人工智能系统全国重点实验室_先进时空数据分析与学习
推荐引用方式 GB/T 7714	张鑫. 面向复杂场景的多源遥感图像目标识别研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2022.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
8.毕业论文.pdf（45330KB）	学位论文		开放获取	CC BY-NC-SA