CASIA OpenIR  > 毕业生  > 博士学位论文
面向跨域场景的目标检测算法研究
田鲲
2024-05-14
Pages146
Subtype博士
Abstract

目标检测是计算机视觉领域中的一项基础任务,具备重要的研究意义和应用价值。一方面,目标检测模型输出的物体类别和位置坐标是实例分割、目标跟踪等高级视觉任务进一步分析的基础元素。另一方面,目标检测模型也是自动驾驶、智能监控、在线教育等现实应用中不可或缺的基础模块。近年来,基于深度学习的目标检测方法取得了令人瞩目的研究进展。然而,监督学习范式往往伴随着高昂的数据标注成本,并且难以泛化至无标签的新场景。无监督域自适应检测方法旨在将模型从有标签源域中学到的知识迁移至无标签的目标域,从而以较低的训练成本提高检测模型的域自适应性能。

尽管现有面向跨域场景的无监督目标检测方法通过对抗训练、图像转换、构建辅助学习任务等方式提高了源域模型在目标域中的测试精度,然而跨域目标检测任务仍存在以下尚待解决的问题:1)如何实现更优的特征对齐机制,从而应对领域负迁移的挑战;2)如何提升检测模型自身的预测鲁棒性,并且减少对目标域数据的依赖;3)如何构建更优的辅助训练任务,从而为下游检测分支提供判别性更强的物体特征。针对上述问题,本文在深度学习框架下进行了面向跨域场景的目标检测方法研究,涵盖了模型构建、训练、优化,以及有效性验证等方面。以下是对主要研究内容和创新点的概述:

1、提出一种基于类别语义先验的域自适应检测方法。}检测模型在不同数据域中提取特征的分布差异是导致其“性能坍塌”的原因之一。现有域自适应方法在对齐源域和目标域特征分布时存在负迁移的挑战。一方面,前景物体特征和背景区域特征可能被错误地对齐。另一方面,不同前景特征之间也可能存在类别混淆。本文所提方法的核心思想是在缩小源域和目标域分布间隙的同时维持类别特征的可区分性。具体地,首先,所提方法构建了源域和目标域共享的前景和背景分类器,通过优化分类边界自适应地聚合不同数据域中的前景特征,同时指导模型区分前景和背景的差异,从而缓解二者之间的负迁移。其次,本研究对细粒度的前景类别特征进行建模。通过构建动态的特征存储模块,记录并更新模型在训练时提取的高质量特征分量,从而更有效地利用模型学到的历史信息。最后,迁移源域、目标域和跨域设定下的类别关系知识,以缓解前景类别特征之间的负迁移。在公开的跨域测试基准上的实验证明,所提方法能充分挖掘源域和目标域共享的类别语义知识,有效克服对抗训练引起的负迁移挑战。

2、提出一种基于多层次一致性正则的域自适应检测方法。}检测模型对风格增广图像的预测鲁棒性低也是导致其“性能坍塌”的原因之一。现有域自适应方法多数从特征对齐的角度减小不同数据域在特征空间中的分布差异,忽略了源域模型缺乏有效训练正则的问题。本文所提方法的核心思想是优化检测模型自身的训练过程,提高其在特征提取、分类预测,以及回归预测方面的鲁棒性。具体地,首先,所提方法基于源域数据构造图像增广数据流,并设计自适应的监督一致性正则,以缓解增广噪声对检测模型训练的干扰。其次,设计分类和回归预测一致性正则,以进一步提高检测模型在识别和定位方面的预测鲁棒性。最后,设计特征提取一致性正则,通过引导检测模型专注于更显著的空间区域和通道索引,增强模型在特征提取阶段的稳定性。在公开的跨域测试基准上的实验证明,所提多层次一致性正则可以相互组合,在未使用目标域数据的情况下也能取得有竞争力的跨域测试性能。此外,本研究和之前所提基于类别语义先验的方法相结合进一步提升了检测模型的域自适应性能。

3、提出一种基于像素感知辅助的域自适应检测方法。检测模型未能提取具有判别性的物体特征是导致其“性能坍塌”的另一个原因。现有域自适应方法将参数解耦、多标签识别,或向量分解作为辅助学习任务,旨在提高模型可迁移性的同时维持其判别性。然而,此类方法仅能隐式地优化特征提取过程,间接改进下游检测任务的训练效果。本文所提方法的核心思想是通过扩展模型的感知能力,显式地引导检测模型提取物体相关的判别性特征,从而更准确地理解物体类别、位置、形状等属性。具体地,首先,所提方法在不增加标签成本的前提下引入了细粒度的像素感知辅助任务。其次,设计的语义提取模块充分结合了前景物体的上下文特征和位于其几何中心的判别性特征。最后,语义融合模块利用提取的物体先验知识补充原始图像特征,从而为下游检测任务提供精炼且有区分性的物体特征。在公开的跨域测试基准上的实验证明,本研究能与之前所提方法有效地结合,在部分跨域场景中的测试性能接近或超越了目标域模型的性能上界。

Other Abstract

Object detection is a fundamental task in the field of computer vision, with important research significance and application value. First, the object categories and location coordinates output by detection models are basic elements for further analysis in high-level visual tasks like instance segmentation and object tracking. Second, object detection models are also essential modules in real-world applications like autonomous driving, intelligent surveillance, and online education. In recent years, deep learning-based object detection methods have made remarkable research progress. However, the supervised learning paradigm often comes with substantial data annotation costs and is difficult to generalize to unlabeled new scenarios. Unsupervised domain adaptation detection methods aim to transfer knowledge learned from the labeled source domain to the unlabeled target domain, thereby improving the domain adaptation performance of detection models with lower training costs.

Although existing unsupervised object detection methods have improved the testing accuracy of the source domain model in the target domain through adversarial training, image transformation, or introducing auxiliary learning tasks, there are still unresolved challenges in cross-domain object detection tasks: 1) How to design a better feature alignment mechanism to alleviate the challenge of negative domain transfer; 2) How to improve the prediction robustness of the detection model and reduce its dependence on target domain data; 3) How to construct a better auxiliary training task and provide more discriminative object features for the downstream detection branch. To address these problems, this dissertation investigates object detection methods for cross-domain scenarios within the deep learning framework. The research covers model construction, training, optimization, and effectiveness validation. The main research contents and innovations are summarized as follows:

1. A domain adaptive object detection method based on category semantic priors is proposed. One reason for the ''performance collapse'' of detection models is the distribution variance in features extracted from different data domains. Existing domain adaptation methods suffer from the challenge of negative transfer when aligning feature distributions between the source and the target domain. First, the features of foreground objects and background regions may be erroneously aligned too closely. Second, there may be category confusion among different foreground features. The core idea of the proposed method is to maintain the discriminability of category features while narrowing the distribution gap between the source and the target domain. Specifically, the proposed method first constructs shared foreground and background classifiers for both domains, which adaptively aggregate foreground features in different domains by optimizing classification boundaries, and guide the model to distinguish the differences between foreground and background, thereby alleviating negative transfer between them. Second, the proposed method models fine-grained foreground category features. A dynamic feature storage module is designed to record and update high-quality feature components during training, thereby leveraging the learned historical information more effectively. Finally, the category relationship knowledge under source domain, target domain, and cross-domain settings is aligned to mitigate the negative transfer among foreground category features. Experiments on public cross-domain benchmarks validate that the proposed method can fully mine the category semantic knowledge shared by the source and target domains, and effectively overcome the negative transfer challenge caused by adversarial training.

2. A domain adaptive object detection method based on multi-level consistency regularization is proposed. The low prediction robustness in style-augmented images is also one of the reasons for the ''performance collapse'' of detection models. Most existing domain adaptation methods reduce the distribution discrepancy between different data domains from the perspective of feature alignment, ignoring the problem that the source domain model lacks effective training regularization. The core idea of the proposed method is to optimize the training process of the detection model and improve its robustness in feature extraction, classification prediction, and regression prediction. Specifically, the proposed method first constructs an augmented data stream based on the source domain and designs an adaptive supervision consistency regularization to alleviate the interference of augmentation noise. Second, consistency regularization on classification and regression further improves the robustness of the detection model in recognition and localization predictions. Finally, the robustness of the detection model in feature extraction is also enhanced by driving its focus on more salient spatial regions and channel axes. Experiments on public cross-domain benchmarks validate that the proposed multi-level consistency regularization can be combined and achieve competitive cross-domain testing performance without using target domain data. Moreover, integrating this research with the previously proposed method based on category semantic priors further improves the domain adaptation performance of the detection model.

3. A domain adaptive object detection method based on pixel perception auxiliary task is proposed. The failure to extract discriminative object features is another reason for the ''performance collapse'' of detection models. Existing domain adaptation methods employ parameter decoupling, multi-label recognition, or vector decomposition as auxiliary learning tasks, enhancing the model's transferability while maintaining its discriminability. However, these methods can only implicitly optimize the feature extraction process and indirectly improve the training effect of the downstream detection task. The core idea of the proposed method is to expand the perception ability of the detection model and provide direct guidance for extracting object-related and discriminative features. This strategy helps the model understand the category, location, shape, and other attributes of objects more accurately. Specifically, the proposed method first introduces a fine-grained pixel perception auxiliary task without increasing labeling costs. Second, the semantic extraction modules are designed to fully integrate the contextual features of foreground objects and the discriminative features at their geometric centers. Finally, the semantic fusion modules supplement the original image features with extracted object prior knowledge, thereby providing refined object features for the downstream detection task. Experiments on public cross-domain benchmarks validate that this research can be effectively combined with the previously proposed methods, and the testing performance in some cross-domain scenarios is close to or exceeds the performance upper bounds of the target domain models.

Keyword目标检测 域自适应 类别语义知识 一致性正则 辅助学习任务
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/56514
Collection毕业生_博士学位论文
Recommended Citation
GB/T 7714
田鲲. 面向跨域场景的目标检测算法研究[D],2024.
Files in This Item:
File Name/Size DocType Version Access License
面向跨域场景的目标检测算法研究.pdf(20235KB)学位论文 限制开放CC BY-NC-SA
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[田鲲]'s Articles
Baidu academic
Similar articles in Baidu academic
[田鲲]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[田鲲]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.