联合显著性检测技术及其应用研究

CASIA OpenIR > 毕业生 > 硕士学位论文

	联合显著性检测技术及其应用研究
	王煜
	2023-05-19
页数	82
学位类型	硕士
中文摘要	联合显著性检测任务衍生于显著性检测任务，旨在挖掘出一组图像中公有的显著性区域。由于联合显著性检测可以实现对视频分析、图像恢复等视觉任务的预处理，该任务一直以来都是研究的热点之一。近年来，随着深度学习的快速发展，联合显著性检测模型的性能取得了持续的突破。现有的部分模型，基于像素级相关性的空间调制以及类别标签的辅助训练，虽然在一定程度上提升了模型的性能，但是也为模型引入了一定的问题。前者所生成的调制图激活区域有限，并不能使模型关注到目标区域的特征；后者引入类别信息，使模型基于类别特性来进行检测，导致了一定的过拟合问题。此外，协同感知任务由于能够解决自动驾驶领域中的遮挡以及远距离目标稀疏感知等固有问题，近年来受到了广泛关注。与联合显著性检测任务类似，协同感知任务同样需要对多组数据进行协同处理，以提升单视角下的感知性能。因而，我们尝试将所研究的联合显著性检测的思想技术运用在协同感知任务中，通过对协同信息进行筛选、调制，在降低通信数据规模的同时，保证模型的整体性能。本文针对联合显著性检测任务中的空间调制方法，模型中特征的层级交互问题以及相关技术思想在协同感知任务中的应用展开研究。论文工作主要包含以下三个部分：（1）基于相似度激活图的联合显著性检测研究。特征的空间调制使模型聚焦于目标区域。现有的部分联合显著性检测模型，借鉴目标跟踪中的方法，利用像素级的相似度来获得空间调制图。基于特征可视化分析，研究发现这类调制图的作用有限。在类激活图相关研究的启发下，模型提出了相似度激活模块，利用梯度信息生成空间调制图，有效地提升了模型的性能。此外，研究尝试将边缘信息引入联合显著性检测任务中。实验结果表明，所提出的边缘融合模块能够有效地提升不同基础模型的性能。（2）基于层级交互池化的联合显著性检测研究。特征的组内一致性与组间可分性是联合显著性检测任务的两个关键属性。研究针对联合显著性检测以图像组为处理单元的特性，设计了金字塔池化交互模块，基于重组卷积的方式，充分挖掘不同感受野下图像间特征的一致性。同时，模型中设计了四分支的网络训练结构，并配合提出了一致性确认模块。基于对比学习的方式，在不利用类别标签的前提下，该结构有效地实现了组间特征的区分。与利用类别等额外监督信息的方法相比，所提出的算法在现有的公开数据集上表现更好。（3）联合显著性检测中特征选择、调制与融合技术于协同感知任务中的应用。协同感知基于信息交互，能够有效解决单一视角下的遮挡等问题。但是在研究过程中，通信带宽的约束一定程度限制了模型的性能。考虑到联合显著性检测任务与协同感知任务的相似性以及联合显著性检测领域研究相对较为成熟，实验尝试利用特征选择模块以及空间特征激活图技术，对车辆信息、空间特征进行调制。同时，基于阈值法实现特征筛选，以有效降低通信数据的规模。此外，实验中利用重组卷积的方式实现了不同源信息的交互与融合。分析结果表明，在引入较少参数量与计算量的情况下，所设计的模块有效提升了模型的性能并降低了对于通信带宽的要求。
英文摘要	Co-salient object detection is a task derived from salient object detection that aims to identify common and salient regions across a set of images. As co-salient object detection is a crucial pre-processing step for many vision tasks such as video analysis and image recovery, it has become a hot topic in research. In recent years, with the rapid development of deep learning, the performance of co-salient object detection models has continuously improved. Some existing models use spatial modulation based on pixel-level correlation and classification auxiliary supervision to enhance their performance. However, these methods have introduced some issues such as limited activation regions in the modulation map and overfitting due to the introduction of category information. Moreover, collaborative perception has gained considerable attention in recent years due to its ability to address the inherent problems of occlusion and sparse perception of long-range targets in the field of autonomous driving. Similar to co-salient object detection, collaborative perception requires the cooperative processing of multiple data sets to improve the perception performance under a single view. Therefore, we propose applying the idea and technique of co-salient object detection to collaborative perception to enhance the overall performance of the model while reducing the scale of communication data. This paper focuses on the spatial modulation methods in co-salient object detection, the hierarchical interaction for features, and the application of related technical ideas in the collaborative perception task. The paper consists of the following three main parts: (1) Research on similarity activation maps of co-salient object detection models. The spatial modulation of features is an effective way for co-salient object detection models to focus on the target regions. However, some existing models use pixel-level similarity to obtain spatial modulation maps, similar to the methods in the tracking tasks. Upon conducting feature visualization analysis, it was discovered that such modulation maps have limited effect. To address this issue, the proposed model draws inspiration from class activation maps and suggests a similarity activation module that generates spatial modulation maps based on gradient information. This approach significantly enhances the model's performance. Moreover, the study attempts to introduce edge information into co-salient object detection and finds that the proposed modules effectively improve the performance of different base models, as supported by experimental results. (2) Research on hierarchical interaction pooling for co-salient object detection models. In co-salient object detection, two critical features are intra-group compactness and inter-group separability. Since image groups serve as processing units, we design a pyramid pooling interaction module for co-salient object detection based on recombinant convolution. This module fully exploits the compactness of features between images under different receptive fields. Additionally, we design a four-branch network training structure with a coherence confirmation module. Using the contrast learning approach, this structure effectively discriminates inter-group features without relying on category labels. The proposed model performs better on existing datasets than methods that use additional supervision information such as categories. (3) Application of feature selection, modulation and fusion techniques in co-salient object detection for collaborative perception. Collaborative perception is a promising approach to overcoming the limitations of individual perception, such as occlusion. However, the performance of collaborative perception models can be limited by the communication bandwidth constraints. To alleviate this issue, we propose a connected autonomous vehicle selection module and activation map learning method to modulate vehicle information and spatial features. To further reduce the communication data size, we implement a feature screening approach based on the threshold method. Additionally, we use the recombinant convolution to achieve the interaction and fusion of information from different sources. These modules are designed by drawing on relevant methods from the co-salient object detection. The results demonstrate that our proposed modules improve model performance and reduce the communication bandwidth requirements, with fewer extra parameters and computations added.
关键词	联合显著性检测特征空间调制组内一致性组间可分性协同感知
语种	中文
七大方向——子方向分类	目标检测、跟踪与识别
国重实验室规划方向分类	视觉信息处理
是否有论文关联数据集需要存交	否
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/51909
专题	毕业生_硕士学位论文
推荐引用方式 GB/T 7714	王煜. 联合显著性检测技术及其应用研究[D],2023.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
王煜_硕士毕业论文.pdf（12172KB）	学位论文		限制开放	CC BY-NC-SA