基于时空关联方法的鲁棒跟踪算法研究

CASIA OpenIR > 多模态人工智能系统全国重点实验室 > 视频内容安全

	基于时空关联方法的鲁棒跟踪算法研究
	刘凯文
	2022-05-20
页数	90
学位类型	硕士
中文摘要	目标跟踪是计算机视觉领域的基础任务之一，有着广阔的应用前景和重要的研究价值。近年来，随着深度学习方法的广泛应用和众多大规模数据集的提出，目标跟踪算法的精度显著提升。当前，基于孪生神经网络结构的目标跟踪方法受到领域内学者的广泛关注，其兼具准确性和实时处理性能。孪生神经网络方法本质上是将目标跟踪问题等效为模板匹配问题进行处理，处理思路简洁高效，并充分利用在大规模数据集上的离线训练，使网络学习到适合跟踪任务的特征匹配空间。虽然此类目标跟踪算法性能优越，然而深入研究发现，当前基于孪生神经网络的目标跟踪算法的鲁棒性弱，在遮挡、快速移动、相似目标干扰等复杂场景中的性能不佳。鲁棒性是跟踪算法的核心指标之一，长期丢失目标会导致跟踪精度显著降低，而当前的众多研究工作将重心放在表观特征的研究上（例如在线更新方法、主干网络设计等），并没有充分利用时序信息或时空关联来提高跟踪算法鲁棒性。因此，本文将围绕漂移问题及鲁棒性这一主题，深入研究时空关联方法对于跟踪算法鲁棒性的作用，主要工作如下： (1) 漂移的量化定义：本文深入研究连续帧输出状态变化模式，给出了漂移的量化定义，并指出存在两种漂移类型，即跳变型漂移和邻近漂移。同时，本文提出可以利用每种输出状态变化模式的帧数占比反映鲁棒性，定量分析跟踪算法对漂移问题的抑制作用。 (2) 基于运动预测的时空约束方法：本文首先针对当前孪生神经网络方法后处理流程中的缺陷和不足进行了深入分析，发现以余弦窗函数作为惩罚项来评估回归框质量不完全公平，需要引入参考框和运动信息来进行评估筛选。其次，本文利用卡尔曼滤波器作为运动预测模型，结合历史帧输出信息，生成当前帧的预测框作为参考框，并以此来评估所有的回归框。本文提出了IoU-Guided 空间约束方法，并通过大量的对比实验确定了空间约束方法与神经网络模型分类分数的结合方式，有效缓解了跳变型漂移问题。 (3) 基于时空关系建模的神经网络方法：本文提出采用两阶段的处理流程来应对邻近漂移问题。第一阶段负责利用孪生神经网络提取候选目标，第二阶段负责利用Self-attention 和Cross-attention 模块对候选目标的特征进行增强，充分挖掘单帧中候选目标之间的空间相对位置关系和相邻帧候选目标之间的对应关系，并以此来进行匹配关联，强化输出结果的鲁棒性。实验证明，此方法可以有效缓解邻近漂移问题。总结而言，本文围绕跟踪过程中的漂移现象展开，主要研究时空关联机制来提高跟踪算法鲁棒性，并通过大量实验证明了所提方法均可有效缓解跟踪中的漂移问题。
英文摘要	Visual object tracking is one of fundamental tasks in the field of computer vision, which has broad application prospects and significant value on scientific research. In recent years, with the widespread application of deep learning methods and many largescale datasets, the accuracy of the tracking algorithms is significantly improved. At present, tracking algorithms based on siamese networks have attracted extensive attention because of high accuracy and real-time performance. Siamese networks essentially regard the tracking tasks as matching problems given a fixed template. These methods are concise but effective which should be attributed to offline training on many largescale datasets. Besides, these networks can learn to distinguish the object and the backgrounds. However, the robustness of these methods is far from perfect when encountering occlusion, fast motion and similar distractors etc. Moreover, the robustness is one of the key statistics of tracking algorithms. Tracking failure in the long-term would cause a drastic decline of the accuracy and weaken the robustness. Nevertheless, most studies have focused on apperance models such as online updating and backbones rather than the robustness of the decision. These works have not exploited spatial -temporal information or spatial-temporal association mechanism to enhance the robustness. Thus, this article will focus on the robustness of tracking algorithms and the drift problem, and study the spatial-temporal association mechanism in depth. The main contributions are summaried as the following list. (1) Quantitative Definition of Drift: This article studies the cases of state changes given consecutive frames and proposes the quantitative definition of drift. Meanwhile, the article claims that there are two kinds of drift, jumps and adjacent drifts. Moreover, the article uses the distribution of the frame ratios of different cases to evaluate the robustness and analyze the effectiveness of mitigating the drift problem. (2) Spatial-temporal Constraint Based on Motion Prediction: Firstly, the article shows that it is inequitable to use Hanning window function to evaluate the quality of all the regressed bounding boxes. It is necessary to involve a motion model to predict a reference bounding box and exploit the motion information for justice. Secondly, we use a Kalman filter to predict the reference box using previous outputs to approximate the groundtruth box in the current frame. Then, we use the reference box to evaluate the accuracy of all the regressed bounding boxes, combine corresponding results with the classification scores produced by the siamese network, and name this method IoU-Guided penalty. This method can effectively mitigate the jump drift. (3) Spatial-temporal Relation Network: The article proposes a two-stage processing method to deal with the adjacent drift problem. The first stage is responsible for extracting candidate targets using the Siamese network, while the second stage is responsible for extracting candidate targets. The latter adopts the Self-attention and Crossattention modules to enhance the features of the candidates and fully exploit spatial-temporal information. The enhanced features will be used to construct cost matrix and accomplish the optimal matching. Meanwhile, these would enhance the robustness of the tracking algorithms. The results show that it can effectively mitigate the adjacent drift problem. To sum up, this article focuses on the robustness of the tracking algorithms and the drift problem. Concretely, the article exploits spatial-temporal association mechanism to enhance the robustness and mitigate the drift problem. The experiments show that these new methods run effectively and efficiently.
关键词	目标跟踪，鲁棒性，时空关联，时空约束，时空关系建模网络
学科领域	模式识别
学科门类	工学::控制科学与工程
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/48838
专题	多模态人工智能系统全国重点实验室_视频内容安全
推荐引用方式 GB/T 7714	刘凯文. 基于时空关联方法的鲁棒跟踪算法研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2022.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
Thesis.pdf（4951KB）	学位论文		开放获取	CC BY-NC-SA