基于轨迹预测和行人重识别模型的多目标跟踪方法研究

CASIA OpenIR > 毕业生 > 硕士学位论文

基于轨迹预测和行人重识别模型的多目标跟踪方法研究

李雪松

2020-05-28

页数

102

学位类型

硕士

中文摘要

随着近些年人工智能技术的发展和计算机视觉的应用，多目标跟踪任务受到越来越多的关注，成为研究的热点。多目标跟踪任务关注于视频场景下多种目标的边框坐标和身份标签，形成连续帧的跟踪轨迹。它比目标检测和单目标跟踪技术更具有挑战性，存在更多的问题和难点。它不仅包括计算机视觉任务中普遍存在的光照变化、图像模糊、环境干扰等因素，还包括目标间相互遮挡、目标随时进入或离开场景等问题。复杂场景下还存在有不同类目标相似性高，同类目标差异巨大，目标在场景中分布密度大以及目标尺度变化等问题，这些都给多目标跟踪任务带来严峻的挑战。除此之外，多目标跟踪任务中连续帧的标注成本高，难以覆盖各种类型的挑战，存在难例样本稀少等问题。针对这些问题，本文进行了深入探讨和研究。

针对多目标跟踪任务中由于轨迹交互或者漏检出现的目标丢失等问题，本文提出了基于时空信息的循环神经网络模型来进行运动目标的下一时刻的位置预测，通过建模目标轨迹的上下文信息提取时域的运动特征；考虑到同一场景下其他目标会对跟踪目标的运动轨迹产生影响，本文通过空域交互模块进行目标的交互特征的提取。通过对运动目标的时空信息的建模，用以提升轨迹预测的准确性和运动特征的鲁棒性。该方法在行人轨迹预测数据集上进行了验证，在多个评测数据集上取得了不错的效果。

针对多目标跟踪任务中由于遮挡或者误检出现的目标身份转换等问题，本文利用改进的行人重识别的方法提出了基于姿态信息和注意力机制的外观模型，通过融合语义信息和注意力机制，生成硬注意力图和软注意力图，分离前景信息和背景信息，同时增强目标的前景信息，抑制背景噪声，用以提取有判别性的外观特征。该方法在行人重识别数据集上进行了验证，在多个公开数据集上取得了良好的性能。

针对多目标跟踪任务中目标存在的巨大差异性以及难以跟踪尺度变化大的目标等难例样本的问题，本文提出了基于多特征融合的数据关联方法。该方法的核心思想是利用提取的外观特征和运动特征进行多特征融合，基于融合后的特征计算已关联轨迹片段与检测结果之间的相似度矩阵，利用相似度矩阵和轨迹片段的可靠性得分，进行基于二次关联的关联匹配，从而实现将待跟踪的检测目标准确的关联到已产生的轨迹片段，逐帧进行数据关联过程，最终实现整个视频场景下的多目标跟踪。本文所提出的跟踪方法和所包含的各个模块均在公共数据集上进行了测试评估，并与其他先进的的方法进行了比较，取得了不错的效果。

针对多目标跟踪任务缺乏大规模精细标注的数据集以及人工标注费时费力的问题，本文提出了基于虚实数据集的多目标跟踪方法，通过融合真实数据集和虚拟数据集对多目标跟踪任务进行可控、可观、可重复的计算实验，以解决数据集不足导致的训练不充分的问题，进一步提升跟踪器的性能。

英文摘要

With the development of artificial intelligence technology and the application of computer vision in recent years, multi-target tracking task has received more and more attention and becomes a research hotspot. Multi-target tracking outputs box coordinates and identity labels, and generates trajectories for multiple objects in video scene. Multi-target tracking is a further technology of target detection and single-target tracking, and there are more problems and difficulties, which include not only problems such as lighting changes, blurred images, and environmental interference in computer vision task, but also issues such as mutual occlusion between targets, and targets entering or leaving the scene at any time. In complex scenes, there are also problems of high similarity between different classes of targets, huge differences between similar targets, high density of targets in the scene, and changes in target scale. These problems bring serious challenges to multi-target tracking task. In addition, the cost of labeling targets of consecutive frames in the multi-target tracking task is high, it is difficult to cover various types of challenges, and samples of difficult cases are scarce. To solve these problems, this dissertation proposes several effective methods.

In order to solve the problem that tracking targets are often missing due to occlusion or missing detection, this paper proposes a recurrent neural network model based on spatio-temporal information to predict the next position of trajectory, and extracts the motion features of the time domain by modeling the context information of trajectories; While taking into account that other targets in the same scene will affect the trajectory of interest target, this dissertation integrates interactive information into motion features to improve the accuracy of trajectory prediction and the robustness of the tracker. This method is verified on pedestrian trajectory prediction datasets, and achieves good results on multiple evaluation datasets.

To solve the problem that the target temporarily leaves the scene and reappears often causes ID switch. In this paper, an improved pedestrian re-identification method is used to construct an appearance model based on pose information and attention mechanism. This dissertation merges semantic information and attention mechanism to generate hard attention map and soft attention map to separate foreground information and background information. And we extract discriminative appearance features by enhancing the foreground information of the target and suppressing the background noise. This method is verified on the pedestrian re-identification datasets and achieves good performance on multiple public datasets.

To solve the problem of the huge difference of targets in multi-target tracking tasks and the difficulty in tracking hard samples such as large changes in scale of target, this paper proposes a data association method based on multi-feature fusion. The core idea of this method is to use the extracted appearance feature and motion feature to perform multi-feature fusion, and calculates the similarity matrix based on the fused features. This dissertation uses the similarity matrix and the reliability score of tracklets to perform association matching based on secondary association, so as to accurately associate the detections of targets to be tracked with the generated tracklets. The data association process is performed frame by frame. Finally, we realize multi-target tracking in the video scene. Our tracker and each of the above modules are tested and evaluated on public datasets, they are compared with other advanced methods, and achieve good results.

In order to solve the lack of large-scale fine-labeling datasets for multi-target tracking task, and the time-consuming and labor-intensive manual labeling, a multi-target tracking framework based on virtual and real datasets is proposed, this paper uses real and virtual datasets to perform controllable, appreciable and repeatable computational experiments on multi-target tracking tasks to solve the problem of insufficient training caused by insufficient datasets, so as to further improve the performance of the tracker.

关键词

多目标跟踪轨迹预测行人重识别多特征融合深度学习

语种

中文

七大方向——子方向分类

人工智能+交通

文献类型

学位论文

条目标识符

http://ir.ia.ac.cn/handle/173211/39066

专题

毕业生_硕士学位论文

推荐引用方式
GB/T 7714

李雪松. 基于轨迹预测和行人重识别模型的多目标跟踪方法研究[D]. 线上. 中国科学院大学,2020.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
李雪松硕士学位论文.pdf（7796KB）	学位论文		限制开放	CC BY-NC-SA