With the development of artificial intelligence technology and the application of computer vision in recent years, multi-target tracking task has received more and more attention and becomes a research hotspot. Multi-target tracking outputs box coordinates and identity labels, and generates trajectories for multiple objects in video scene. Multi-target tracking is a further technology of target detection and single-target tracking, and there are more problems and difficulties, which include not only problems such as lighting changes, blurred images, and environmental interference in computer vision task, but also issues such as mutual occlusion between targets, and targets entering or leaving the scene at any time. In complex scenes, there are also problems of high similarity between different classes of targets, huge differences between similar targets, high density of targets in the scene, and changes in target scale. These problems bring serious challenges to multi-target tracking task. In addition, the cost of labeling targets of consecutive frames in the multi-target tracking task is high, it is difficult to cover various types of challenges, and samples of difficult cases are scarce. To solve these problems, this dissertation proposes several effective methods.
In order to solve the problem that tracking targets are often missing due to occlusion or missing detection, this paper proposes a recurrent neural network model based on spatio-temporal information to predict the next position of trajectory, and extracts the motion features of the time domain by modeling the context information of trajectories; While taking into account that other targets in the same scene will affect the trajectory of interest target, this dissertation integrates interactive information into motion features to improve the accuracy of trajectory prediction and the robustness of the tracker. This method is verified on pedestrian trajectory prediction datasets, and achieves good results on multiple evaluation datasets.
To solve the problem that the target temporarily leaves the scene and reappears often causes ID switch. In this paper, an improved pedestrian re-identification method is used to construct an appearance model based on pose information and attention mechanism. This dissertation merges semantic information and attention mechanism to generate hard attention map and soft attention map to separate foreground information and background information. And we extract discriminative appearance features by enhancing the foreground information of the target and suppressing the background noise. This method is verified on the pedestrian re-identification datasets and achieves good performance on multiple public datasets.
To solve the problem of the huge difference of targets in multi-target tracking tasks and the difficulty in tracking hard samples such as large changes in scale of target, this paper proposes a data association method based on multi-feature fusion. The core idea of this method is to use the extracted appearance feature and motion feature to perform multi-feature fusion, and calculates the similarity matrix based on the fused features. This dissertation uses the similarity matrix and the reliability score of tracklets to perform association matching based on secondary association, so as to accurately associate the detections of targets to be tracked with the generated tracklets. The data association process is performed frame by frame. Finally, we realize multi-target tracking in the video scene. Our tracker and each of the above modules are tested and evaluated on public datasets, they are compared with other advanced methods, and achieve good results.
In order to solve the lack of large-scale fine-labeling datasets for multi-target tracking task, and the time-consuming and labor-intensive manual labeling, a multi-target tracking framework based on virtual and real datasets is proposed, this paper uses real and virtual datasets to perform controllable, appreciable and repeatable computational experiments on multi-target tracking tasks to solve the problem of insufficient training caused by insufficient datasets, so as to further improve the performance of the tracker.