基于关系建模的视觉多目标跟踪

CASIA OpenIR > 毕业生 > 硕士学位论文

	基于关系建模的视觉多目标跟踪
	张广耀
	2022-06
页数	0-75
学位类型	硕士
中文摘要	多目标跟踪是计算机视觉的基础任务，其目标是在一段视频中将同一身份的物体检出并关联起来。性能优异的多目标跟踪器可以帮助动作识别与自动驾驶等任务。在实际场景中，多目标跟踪常常面临的场景是非常复杂的，比如严重遮挡、光照变化等。这给多目标跟踪问题带来了严重的挑战。与单目标跟踪相比，多目标跟踪涉及到同时跟踪多个目标，而每个目标内部也可以抽取多视图特征。为了处理复杂场景下的多目标跟踪，有必要挖掘多目标跟踪的各个粒度下特征的关系，通过对关系进行建模，能够获得更加鲁棒的多目标跟踪器。随着关系建模方法的发展，如何分析多目标跟踪的多个粒度特征之间的关系，进而更好地对多目标跟踪的问题进行建模和解决，成为了计算机视觉研究的热点问题。本文对多个粒度的特征进行了分析和建模，通过挖掘同一实例的不同视图、不同实例、以及实例构成的群组之间的关系弥补了多目标跟踪系统中存在的问题和不足。这三个粒度的关系是递进的关系，从实例内的不同视图的特征，到实例之间，再到实例构成的群组之间。具体来说，本文从以下三个方面进行了研究： • 基于视图关系建模的多目标跟踪。本文分析极度密集场景下行人跟踪的问题，提出了一种联合行人人头视图和全身视图进行跟踪的方法：利用人头框动态生成行人全身框，利用去噪声的行人重识别特征辅助人头框的跟踪。通过人头特征与全身特征的关系建模，以及人头框几何与全身框几何的关系推导，同时利用了人头的少遮挡特性和全身框的高判别力的特征，在人头跟踪数据集上取得了最好的效果。 • 基于实例关系建模的多目标跟踪。本文分析多目标跟踪的运动模型，提出了一种多实例之间利用自注意力机制进行关系建模的多目标跟踪运动模型，即利用自注意力和互注意力的网络，迭代的对多个实例之间进行关系建模和运动预测。通过对多个实例之间建立有效的自注意力关系，本文提出的方法在拥挤的场景下减少了多目标运动模型的混淆。 • 基于群组关系建模的多目标跟踪。本文分析了多目标跟踪的训练整体流程，提出了一个可微分的群组关系建模的模型。多目标跟踪的训练可以定义为跟 I 基于关系建模的视觉多目标跟踪踪片段群组与检测结果群组的匹配过程，通过设计可微分的匹配算法，弥补了之前多目标跟踪模型训练与推理的不一致性。通过对实例之间组成的群组之间的关系进行建模和微分设计，本文提出的方法得到了更好的表观模型，在拥挤场景下性能提升显著。本文扩展了关系建模方法在在线多目标跟踪问题中的应用范围，使用关系建模的方式，在多个粒度下挖掘特征之间的关系，更好地解决了复杂情况下的多目标跟踪问题。
英文摘要	Multi-Object-Tracking(MOT) is a fundamental task in computer vision. It aims to detect objects of a specific class in a video and associate objects with the same identities to form trajectories. Good resolution of this task will benefit a lot of downstream tasks such as action recognition, autopilot system, etc. However, multi-object-tracking systems need to deal with complex scenarios in real-world applications, such as severe occlusion, change of light, etc., which brings serious challenges for the multi-objecttracking system. Compared with single-object-tracking, MOT involves multiple instances. Besides, multiple views of feature of the same instance could be extracted. To deal with multiobject-tracking problems in complex scenarios, feature relation modeling at multiple granularities becomes necessary. By relation modeling, multi-object-tracking system could become more robust to noises and occlusions. In recent years, with the rapid development of relation modeling method, it has extracted more and more concentration on how to involve relation modeling method in MOT at multiple granularities to deal with complex scenarios. In this thesis, we study the relation modeling in MOT at three different granularities: intra-identity, inter-identity, and inter-group. Then we propose three models using relation modeling to enhance current MOT systems: • Relation modeling between different views of the same identity. We design a pedestrian head tracking system by fusion of full-body appearance feature. By dynamically generating full-body bounding boxes from head bounding boxes and pose-guided person Re-ID feature, our proposed method gets the best result on Head Tracking 21 dataset, which is a classical pedestrian tracking dataset in extreme scenarios. • Relation modeling between different identities. We design a self-attention-based motion model. We design an iterative motion prediction and self-attention model among multiple tracklets. This motion model can well resolve the problem of motion prediction in the crowd. Experimental result shows that our method outperforms motion models III 基于关系建模的视觉多目标跟踪 without relation modeling module. • Relation modeling between different identity groups. We define the multi-objecttracking problem as a group matching problem and propose a differential multi-objecttracking data association method. We make Hungarian method differential by designing an effective gradient. With this differential Hungarian method, we bridge the gap between MOT training and inference. Our proposed method gets the best result on keypoint matching task and outperforms the baseline method on MOT. This thesis extends the application of relation modeling method in online multiobject-tracking problem and enhance the multi-object-tracking in real-world scenarios via adopting relation modeling method at different granularities
关键词	多目标跟踪关系建模数据关联运动模型遮挡场景
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/48849
专题	毕业生_硕士学位论文
通讯作者	张广耀
推荐引用方式 GB/T 7714	张广耀. 基于关系建模的视觉多目标跟踪[D]. 自动化研究所. 中国科学院大学,2022.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
学位论文.pdf（14603KB）	学位论文		限制开放	CC BY-NC-SA