CASIA OpenIR  > 毕业生  > 硕士学位论文
基于特征学习和伪标签预测的弱监督行人搜索方法研究
王本智
2024-05
Pages76
Subtype硕士
Abstract

行人搜索技术旨在将行人检测和再识别统一到同一框架中,在确定行人位置的同时提供其身份特征,以提升系统的性能和效率。行人搜索作为行人检测和再识别的联合任务,有监督训练时需要同时对行人位置和行人身份进行标注,这无疑增加了数据标注成本。针对目前有监督行人搜索标注成本较高的问题,本论文致力于研究弱监督设置下如何高效完成行人搜索任务,其中,弱监督设置指仅通过行人位置标注完成行人搜索。通过这种方式能够显著降低标注成本,有助于提高行人搜索任务的实际应用价值。本论文主要贡献如下:

1.提出了一种基于尺度不变性特征学习的端到端弱监督行人搜索框架。

由于行人目标可能在不同的时间和场景下被不同摄像机捕捉,这导致行人目标距离摄像机的远近不同,同一个行人通常拥有不同尺度的图像,这使得行人间的匹配变得更加困难。例如同一行人在不同尺度下的特征相似度降低、不同行人在相似尺度下的特征相似度提高;另一方面,小尺度图像包含的行人信息较少,会影响伪标签预测的准确度。针对端到端弱监督行人搜索技术面临的尺度问题,本论文提出基于尺度不变性特征学习的端到端弱监督行人搜索框架,该框架包括尺度不变性特征学习模块和动态多标签学习模块。其中,尺度不变性特征学习模块通过引入自相似驱动的尺度不变损失函数并构造出跨尺度正负样本对,促使模型提取行人目标的尺度不变性特征;另外,为了缓解阈值设置不当导致标签预测错误的问题,本论文提出一种动态多标签学习模块,即,将伪标签预测视为多标签分类问题,并通过动态阈值调节的方式渐进地完成伪标签预测。

2.提出了一种基于多层适配微调的二阶段弱监督行人搜索框架。

考虑到行人检测器对正样本的判定相对宽松,可能导致检测算法输出的边界框对下游行人再识别任务而言是次优的,即,边界框内可能丢失行人目标的关键判别信息或引入干扰信息。针对二阶段弱监督行人搜索中检测结果与行人再识别模型不适配的问题,本论文提出基于多层适配微调的二阶段弱监督行人搜索框架,该框架包括多层适配微调模块和多层次伪标签预测模块。其中,多层适配微调模块通过在预训练行人再识别模型各层中加入特定的适配器进行微调,从而使行人再识别模型以检测结果裁剪图像作为输入时也可以输出具有判别力的行人身份特征。进一步地,本文提出多层次知识蒸馏损失函数,对行人再识别模型以检测结果裁剪图像为输入时产生的中间特征进一步进行约束,以此得到更具判别力的行人身份特征。此外,以往的伪标签预测方法通常在准确率和召回率之间进行权衡,以此得到较好的性能,但会导致召回率或准确率受损。针对这一问题,本论文提出多层次伪标签预测模块,该模块旨在在保持伪标签召回率的同时,提高其准确率,从而生成更高质量的伪标签完成模型训练。

Other Abstract

Person search aims to unify pedestrian detection and person re-identification within the same framework, identifying pedestrian locations while providing their identity features to enhance the system's performance and efficiency. As a joint task of pedestrian detection and re-identification, supervised training requires simultaneous labeling of pedestrian locations and identities, undoubtedly increasing the cost of data labeling. Aiming at the problem of high labeling cost of supervised person search, this thesis is devoted to researching how to efficiently complete the person search task under the weakly supervised setting, in which the weakly supervised setting refers to completing the person search only through the labeling of the pedestrian's location. In this way, the labeling cost can be significantly reduced and the practical application value of the person search task can be improved. The main contributions of this thesis are as follows:

1. Proposing an end-to-end weakly supervised person search framework based on scale-invariant feature learning.

Due to pedestrians potentially being captured by different cameras at different times and scenes, this leads to varying distances between the pedestrian subjects and the cameras. The same pedestrian often has images of different scales, making matching between pedestrians more challenging. For example, the feature similarity of the same pedestrian decreases under different scales, while the feature similarity between different pedestrians increases under similar scales. On the other hand, images of smaller scales contain less information about the pedestrian, affecting the accuracy of pseudo label predictions. 
To address the scale issue faced by end-to-end weakly supervised person search, this thesis proposes an end-to-end weakly supervised person search framework based on scale-invariant feature learning. This framework comprises a scale-invariant feature learning module and a dynamic multi-label learning module. The scale-invariant feature learning module encourages the model to extract scale-invariant features of pedestrian targets by introducing a self-similarity driven scale-invariant loss function and constructing cross-scale positive and negative sample pairs. Additionally, to mitigate the issue of incorrect label prediction due to improper threshold settings, this thesis introduces a dynamic multi-label learning module. This module treats pseudo label prediction as a multi-label classification problem and progressively completes pseudo label prediction by adjusting the threshold dynamically.

2. Proposing a two-stage weakly supervised person search framework based on multi-layer adaptation fine-tuning.

Considering that pedestrian detectors are relatively lenient in determining positive samples, which may lead to suboptimal bounding box outputs for the downstream person re-identification task, i.e., key discriminative information of the person may be lost or interference information may be introduced within the bounding box. To address the issue of mismatch between detection results and the person re-identification model in two-stage weakly supervised person search, this thesis proposes a two-stage weakly supervised person search framework based on multi-layer adaptation fine-tuning, including a multi-layer adaptation fine-tuning module and a multi-level pseudo label prediction module. The multi-layer adaptation fine-tuning module fine-tunes the pre-trained person re-identification model at various layers by inserting specific adapters, thus enabling the person re-identification model to output discriminative person identity features when input with images from pedestrian detection results. Furthermore, this thesis introduces a multi-level knowledge distillation loss function to further constrain the intermediate features generated by the person re-identification model when input with detection result images, thereby obtaining more discriminative person identity features. Additionally, previous methods of pseudo label prediction often balance between accuracy and recall to achieve better performance but at the cost of reduced recall or precision. To address this issue, this thesis proposes a multi-level pseudo label prediction module aiming to improve the accuracy of pseudo labels while maintaining their recall rate, thereby generating higher quality pseudo labels for training.
 

Keyword行人搜索,行人再识别,弱监督学习,度量学习,伪标签预测
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/57177
Collection毕业生_硕士学位论文
Recommended Citation
GB/T 7714
王本智. 基于特征学习和伪标签预测的弱监督行人搜索方法研究[D],2024.
Files in This Item:
File Name/Size DocType Version Access License
(已压缩)王本智_毕业论文-终稿.pdf(3714KB)学位论文 限制开放CC BY-NC-SA
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[王本智]'s Articles
Baidu academic
Similar articles in Baidu academic
[王本智]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[王本智]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.