多源图像融合目标检测技术研究

CASIA OpenIR > 多模态人工智能系统全国重点实验室 > 先进时空数据分析与学习

	多源图像融合目标检测技术研究
	胡宇轩
	2024-05
页数	76
学位类型	硕士
中文摘要	得益于过去十余年人工智能的迅猛发展，目标检测算法模型的性能取得了持续提升。然而单源图像的目标检测任务，受单源传感器只能获取物理世界片面信息的限制，在很多应用场景中并不足以满足应用需求。利用多个异构传感器对感兴趣目标或区域同时成像，可以获取包含互补与冗余信息的多源图像，能为揭示、辨识目标提供更全面的物理信息，并产生了多源图像融合目标检测技术的研究。在深度学习的背景下，多源图像融合目标检测也取得了长足的发展，当前的多源图像融合目标检测研究在道路交通、遥感监测、安防监控等场景中取得了一定成果，但对于一些特殊应用场景下的任务需求，依然需要进一步研究。在全天时多源光学图像融合感知任务中，一些非理想成像条件下多源图像感知系统中某些传感器获取的数据较为低质，单幅图像中也可能出现局部信息的受损或缺失，需要深入研究在上述差异化像质情形下有效进行特征级融合的算法。在安防监控等场景下的多源数据增强单源图像视觉感知任务中，出于对降低系统采购、运行与维护成本的需求，希望突破单源检测系统的性能上限，提升单源图像目标检测器的性能，需要继续研究适用性强的多源数据增强单源图像目标检测算法。在分布式对地观测系统的决策级融合感知任务中，为满足系统对高时效性的要求，在每个平台上利用轻量级目标检测器完成初步检测，并将数据规模很小的检测结果传输到融合节点进行决策级融合，需要研究适用于轻量级目标检测器的决策级融合算法。本文围绕上述三类特殊应用场景下的多源图像融合目标检测算法开展研究。全文工作主要包含以下三部分：（1）空间与通道双全局注意力多源图像目标检测算法。通过对多源图像差异化像质情形的考察，认为增强互补信息挖掘与融合效果，有助于提升多源图像融合目标检测器性能。本项研究提出了双注意力Transformer特征融合模块，用于采用双流结构的多源图像融合目标检测器中的特征级融合。该模块使用跨特征图的全局空间注意力挖掘与融合互补信息，使用跨特征图的全局通道注意力增强冗余信息。在可见光-红外数据集上进行的系列实验表明，相比其他检测器，使用该特征融合模块的多源图像融合目标检测器在选用的数据集上取得最佳效果，在差异化像质情形下的性能得到提升。（2）保持记忆单元独特性的多源数据增强目标检测算法。当前最先进的方法采用Key-Value结构的记忆模块，在训练阶段将红外深度特征与可见光深度特征分别作为Key和Value存储在记忆单元中，在部署时使用红外深度特征读取记忆模块，得到伪可见光深度特征。记忆模块参数通过损失函数梯度反向传播进行更新，容易导致记忆模块中记忆单元的独特性不足，直接影响模型的泛化性能。本项研究提出了保持记忆单元独特性记忆模块与多源数据增强红外目标检测器。记忆模块使用由删除、重填、分配和更新四个主要步骤组成的参数更新方法，保持了记忆单元独特性。检测器使用记忆模块读取过程的中间结果产生权重，用于伪可见光特征与红外特征之间的特征级融合。在可见光-红外交通场景数据集上进行系列实验，验证了算法的有效性。（3）基于轻量级检测器原始输出的决策级融合检测算法。已有的决策级融合方法直接使用最终检测结果进行融合，没有充分利用目标检测器的原始输出，损失了有用信息。本项研究提出了基于轻量级目标检测器原始输出的决策级融合算法。该算法由算法框架与融合方法组成。算法框架用于由检测结果集合产生配对的检测结果；融合方法利用目标检测器原始输出，针对检测器原始输出组成与Yolov3和Yolov8相同的两类常见轻量级目标检测器分别进行设计。在可见光-红外航空遥感数据集上的实验表明，本项研究提出算法具有提升全类平均精度等性能指标或减少虚警数量的效果。在全色光学-合成孔径雷达图像舰船检测数据集上进行定性试验，验证了本项研究所提出算法在分布式对地观测卫星系统中的适用性。
英文摘要	Owing to the rapid development of artificial intelligence in the past decade, the performance of object detectors has been continuously improved. However, a single sensor can only obtain partial information about the physical world, so the performance of object detectors using a single image is often limited, making them inadequate to fulfill the requirements in numerous application scenarios. Using multiple heterogeneous sensors to simultaneously image regions of interest enables the acquisition of multi-source images containing complementary and redundant information. This facilitates a more comprehensive understanding of the physical attributes necessary for revealing and identifying the object, thereby fostering the research of multi-source image fusion object detection. In the context of deep learning, multi-source image fusion for object detection has also made great progress. Current research has achieved some progress in scenarios such as road traffic, remote sensing surveillance and security monitoring. However, further research is still needed to address the task requirements in some special application scenarios. In the task of multi-source optical image fusion perception in all-day scenarios, the data acquired by some sensors in the multi-source image perception system under some non-ideal imaging conditions is relatively low quality, and local information may also be damaged or missing in single images. Therefore, further study is needed to explore algorithms for effectively conduct feature-level fusion in the aforementioned differentiated image quality conditions. In scenarios such as security surveillance, the task of enhancing single-source image visual perception with multi-source data aims to surpass the performance limits of single-source detection systems and improve the performance of single-source image object detectors. This is driven by the need to reduce system procurement, operation, and maintenance costs. There is a continued need for research into highly applicable algorithms for multi-source data enhanced single-source object detection. In the decision-level fusion perception task of distributed earth observation systems, in order to meet the system's requirements of high timeliness, lightweight object detectors are used on each platform to conduct preliminary detection. The detection results with small data scales are transmitted to the fusion nodes for decision-level fusion. So it is necessary to study the decision-level fusion algorithm suitable for lightweight object detectors. This paper focuses on the research of multi-source image fusion object detection algorithms in the above three special application scenarios. The full text mainly includes the following three parts: (1) Multi-source object detection algorithm based on spatial and channel dual global attention. Based on the investigation of the differentiated image quality of multi-source images, we think enhancing the effects of complementary information mining and fusion is beneficial for improving the performance of multi-source image fusion object detectors. In this study, a Dual Attention Transformer Feature Fusion Module is proposed for feature-level fusion in a dual-flow multi-source image fusion object detector. This module uses the global spatial attention across feature maps for mining and fusing complementary information, and uses global channel attention across feature maps for enhancing redundant information. A series of experiments on visible-infrared datasets show that, compared with other detectors, the multi-source image fusion object detector using the proposed feature fusion module achieves the best results and the performance under differentiated image quality conditions is improved. (2) Multi-source data enhanced object detection algorithm that preserves the discrepancy of memory units. At present, the state-of-the-art method adopts a memory network with Key-Value structure. In the training phase, infrared features and visible features are stored in the memory network as Key and Value respectively. In the deployment phase, infrared features are used to read the memory network and obtain pseudo-visible feature. Parameters of the memory network are updated through back-propagation of loss function, which easily leads to the lack of discrepancy of memory units in memory network, and directly affects the generalization performance of the model. In this study, a Discrepancy Preserving Memory Network and infrared object detector with multi-source data enhancement is proposed. The memory network uses a parameter update method consisting of four main steps consisting of deleting, refilling, allocating and updating to preserve the discrepancy of memory units. The detector uses the intermediate results of the read process to generate weights for the feature-level fusion between the pseudo-visible features and infrared features. A series of experiments on visible-infrared traffic scene datasets demonstrate the effectiveness of the proposed algorithm. (3) Decision-level fusion algorithm for object detection based on the raw output of lightweight object detector. Existing decision-level fusion methods directly fuse the final detection results, and do not make full use of the raw output of the object detector, losing useful information. In this study, a decision-level fusion algorithm utilizing the raw output of lightweight object detector is proposed. The algorithm consists of algorithm framework and fusion method. The algorithm framework is responsible for generating paired detection results from detection result unions. The fusion methods utilizing the original output of object detectors are respectively designed for two kinds of common lightweight object detectors whose raw output composition is the same as that of Yolov3 and Yolov8 detectors. The experiment results on visible-infrared aerial remote sensing datasets demonstrates that our proposed algorithm serves to improve the all-class mean average precision or reduce the number of false alarms. Qualitative tests are carried out on a Panchromatic-Synthetic Aperture Radar ship detection dataset, which verifies the applicability of the proposed algorithm in distributed earth observation satellite systems.
关键词	多源图像融合目标检测双全局注意力记忆模块原始输出
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/57139
专题	多模态人工智能系统全国重点实验室_先进时空数据分析与学习
推荐引用方式 GB/T 7714	胡宇轩. 多源图像融合目标检测技术研究[D],2024.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
硕士学位论文签字版.pdf（7536KB）	学位论文		开放获取	CC BY-NC-SA