视觉场景中的行人表征与识别

CASIA OpenIR > 毕业生 > 博士学位论文

	视觉场景中的行人表征与识别
	罗传琛
	2023-05-20
页数	114
学位类型	博士
中文摘要	随着智慧城市进程的稳步推进，部署于各场所的监控摄像头规模迅速扩大。在此背景下，使用人工进行监控视频数据的检索和处理是不切实际的，如何利用计算机视觉技术实现自动化的监控成为研究的重点。行人重识别作为视频智能监控的关键组成部分得到了学术界和工业界的广泛关注，它旨在多个不重叠的相机视角下检索出和给定查询图片属于同一行人的其他图片。行人重识别的关键在于学习一个类内足够紧致、类间足够分散的特征空间。然而，相机视角、人体姿态、环境光照和拍摄背景的剧烈变化可能会使得同一个人的图片有着显著不同的视觉外观，这对模型的判别能力提出了极大的挑战。此外，对标注数据的严重依赖也限制了行人重识别模型在真实场景中的部署。为克服这些困难，本文从三个方面探讨如何提升行人重识别的判别性和泛化性。本文的主要创新点包括： 1. 提出基于特征聚合的全监督行人重识别方法，有效增强了网络模型应对类内差异能力。类内差异是影响模型判别性能的主要因素，为抑制类内差异的影响，现有方法通常选择对样本单独分类或在少量样本间约束成对相似度。此类方法忽视了批量样本间的联系，因此往往会陷入次优的局部最优解。为此，本研究提出将批量样本视为密集连接的图，并使用随机游走过程聚集特征，以缓解分类混淆的问题，从而达到克服类内差异的目的。本研究还将这一思路进一步扩展到行人检索的后处理阶段以及不同维度的视觉任务中。实验表明，特征聚合在不同阶段和维度上均能够显著提升性能，这证实了该思路的有效性和普适性。 2. 提出基于双重差异弥合的跨域行人重识别方法，旨在减轻域间差异和域内相机间差异对迁移性能的影响。本研究分析发现，域间差异和域内相机间差异共同导致了行人重识别模型迁移能力的低下。为了应对域内相机间差异，本工作提出了相机感知的邻域一致性约束，有效避免了普通邻域一致性约束中对同相机候选项的过度偏好。同时，为了缓解域间差异，本研究提出了域间混合机制来平滑源域至目标域的过渡。在这两个组件的协同作用下，该工作提出的模型在多个跨域行人重识别基准上实现了领先的性能。 3. 提出基于三维虚拟生成的行人重识别方法，无代价地合成虚拟数据作为训练源域，避免了昂贵标注数据的使用。本研究提出了一种姿态可控的三维生成模型，通过光栅化技术引导三平面表征，并使用可微分四面体直接输出三维网格。通过操纵姿态和相机视角，该模型可以生成多样的行人图片。基于这些合成的虚拟数据，本研究进一步利用跨域行人重识别算法进行虚拟域到真实域的迁移，从而实现了无标注数据条件下获取高性能的行人重识别模型的目标。总而言之，本文的研究重点在于解决行人重识别的类内差异和泛化性问题。首先，本文研究了行人重识别在全监督设定下的判别性问题，核心在于抑制类内差异的影响。其次，本文探索行人重识别模型的泛化性问题，分析跨域设定下的双重差异，并设计相应模块加以解决。最后，本文尝试在不使用昂贵标注信息的条件下学习可迁移的行人重识别模型，使用三维合成技术生成虚拟数据作为训练的源域数据。实验表明，本文提出的模块均取得了显著的性能提升，且提出的模型在对应的基准数据集上都取得了领先的性能优势。
英文摘要	With the steady advancement of the intelligent city process, the scale of surveillance camera networks deployed in various places is getting larger and larger. In this context, it is impractical to retrieve and process surveillance video data manually. Recent research focuses on employing computer vision technology to automate the surveillance pipeline. As a critical component of the intelligent surveillance system, person re-identification has drawn extensive attention from academia and industry. Given a picture of a person to be queried, person re-identification (re-ID) aims to retrieve other images of this person from multiple non-overlapping camera perspectives. The key to person re-identification is to learn a feature space that possesses intra-class compactness and inter-class separation. However, drastic changes in camera viewpoint, illumination, background, and human pose may cause dramatic visual appearances between images of the same person, which greatly challenges the discrimination ability of the re-ID model. In addition, the heavy reliance on annotated data also limits the deployment of person re-identification models in real scenarios. To overcome these issues, this thesis explores improving the discrimination and generalization of person re-identification from three aspects. 1. proposes a feature aggregation method for fully supervised person re-identification, which effectively improves the resistance of the model to intra-class variation. The intra-class variation is the main challenge to the discrimination ability of the model. To alleviate this issue, existing methods either classify the samples individually or constrain the pairwise similarity between a small number of samples. Such methods ignore the dense connection between input samples, so they tend to fall into suboptimal local optima. To this end, this work proposes to formulate a training batch as a densely connected graph, and aggregates features according to the random walk process on the graph. This practice alleviates the confusion around the decision plane and enhances the resistance to intra-class variations. Furthermore, this work extends this idea to the post-processing stage of person re-identification and tasks in other dimensions. Experimental results show that feature aggregation brings performance improvements in all stages and dimensions, which verifies the effectiveness and versatility of this idea. 2. proposes a dual discrepancy adaptation method for cross-domain person re-identification to alleviate the adverse effect of inter-domain shift and inter-camera discrepancy. This work reveals that the low transferability of the person re-identification model is attributed to inter-domain shift and inter-camera discrepancy. To handle inter-camera discrepancy within the target domain, this work proposes a camera-aware neighborhood consistency constraint, which avoids the bias to intra-camera candidates in vanilla neighborhood consistency constraints. To cope with inter-domain shifts, this work proposes a cross-domain mixup scheme to smooth the transition from the source domain to the target domain. By combining the two components, the model proposed in this work achieves leading performance metrics on all benchmarks for cross-domain person re-identification. 3. proposes a 3D synthesis method for person re-identification. Rather than training the model on prohibitive annotated data, this work proposes to use synthetic data as the source domain for training. This work presents a pose-controllable 3D generative model for virtual data synthesis. Specifically, It guides the tri-plane representation via rasterization and uses Deep Marching Tetrahedra to directly output a 3D mesh. By manipulating the pose and camera perspective, this work can synthesize diverse training samples for person re-identification. After that, this work employs the cross-domain person re-identification algorithm to transfer from the synthetic domain to the realistic domain. In this way, this work can achieve a transferable person re-identification model without any identity annotations. In summary, this thesis first studies the discrimination ability of person re-identification models in a fully supervised setting. The core of this research is to suppress the adverse effect of intra-class differences. After that, this thesis explores the generalization problem of person re-identification, reveals the existence of dual discrepancies under the cross-domain setting, and designs corresponding modules to solve them. Finally, this thesis attempts to learn a transferable person re-identification model without using prohibitive annotations. Instead, it applies 3D synthesis techniques to generate virtual data as the source domain data for training. The proposed components in this thesis lead to improvements consistently. The proposed models achieve leading performance on corresponding benchmarks.
关键词	行人重识别表征学习特征聚合领域自适应虚拟数据合成
语种	中文
七大方向——子方向分类	生物特征识别
国重实验室规划方向分类	多模态协同认知
是否有论文关联数据集需要存交	否
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/51908
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	罗传琛. 视觉场景中的行人表征与识别[D],2023.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
Thesis.pdf（7530KB）	学位论文		限制开放	CC BY-NC-SA