CASIA OpenIR  > 毕业生  > 博士学位论文
多摄像机视觉目标跟踪关键问题研究
陈威华
学位类型工学博士
导师黄凯奇
2017-11
学位授予单位中国科学院研究生院
学位授予地点北京
关键词单目标跟踪 单摄像机多目标跟踪 跨摄像机多目标跟踪 多摄像机目标跟踪 行人再识别
摘要多摄像机目标跟踪是计算机视觉领域重要任务之一,其应用十分广泛,特别是在智能视频监控中。近年来,随着人们对社会安全关注的提高,以及越来越多的摄像头开始出现在世界范围内各个城市,人们对智能视频监控的要求也越来越高。而作为其中的核心问题之一,多摄像机目标跟踪也引起了越来越多研究者的注意。如何设计一个实用的多摄像机目标跟踪算法牵涉到很多方面的问题,包括目标检测,单目标跟踪,单摄像机多目标跟踪,跨摄像机多目标跟踪等。而其中的每一个环节又有着很多的问题和难点。本论文旨在建立一套完整的多摄像机目标跟踪算法,并对其中几个环节中重要的问题点进行针对性的研究,提出对应的改进方案,具体研究如下:

(1) 在多摄像机目标跟踪过程中,由于检测结果数量过多,会大大降低之后数据关联过程的速度和准确率。因此我们首先对每个目标进行简单的单目标跟踪,得到初始轨迹片段。但是,现今的单目标跟踪方法很难保证跟踪的准确率,而错误的轨迹片段会对之后的多目标跟踪产生严重的影响。为了保证跟踪的准确率,我们提出一种基于自适应的多特征融合的单目标跟踪方法。该方法通过学习各种特征在跟踪过程中的稳定性,动态的选择最稳定、最适合跟踪的特征,来主导跟踪过程。同时,我们根据当前预估的跟踪准确率去实时地控制特征的更新,一定程度上解决了跟踪的漂移问题。在ICCV2013年举办的单目标跟踪竞赛中,我们的跟踪方法在所有27支队伍中取得了准确率第四名的成绩。

(2) 在得到初始轨迹之后,主流的多摄像机目标跟踪方法会分成单摄像机目标跟踪和跨摄像机目标跟踪两个步骤,分别对这两部分进行独立处理。在这种情况下,如果单摄像机跟踪中存在误差,那么在之后的跨摄像机目标跟踪中该误差将被进一步放大,从而影响最终结果。本文提出一个基于均衡化的全局图模型结构来实现多摄像机目标跟踪,该方法将单摄像机目标跟踪和跨摄像机目标跟踪放在一个图模型中进行统一优化,克服了分别优化所带来的误差放大问题。同时,现今多摄像机目标跟踪研究尚未形成一个统一的评价准则和公共数据集。我们提出了一个指标更加清晰的评价准则,并在此基础上公布了一个更有代表性的数据集。

(3) 在完成整个多摄像机目标跟踪体系之后,我们发现,对多摄像机目标跟踪影响最大的部分就是跨摄像机中的特征匹配过程。把该问题抽象出来,则是一个经典的行人再识别问题。很多现今主流的方法将行人再识别看成一个分类问题或者排序问题来处理。我们通过探究两者之间的关系,发现两者之间的互补性。因此,我们提出了两种融合方式对两者进行有效融合:1)我们提出了多任务深度网络来同时优化分类和排序两个问题;2)我们设计了一个四元组损失函数,该函数既包含了分类和排序损失函数中的优点,又一定程度上避免了其各自存在的不足。我们的两种方法在公开数据集上都达到了优于之前方法的效果。
其他摘要Multi-camera multi-object tracking is an important task in computer vision. It can be widely used, especially in intelligent video surveillance. In recent years, with the increasing attention of social security and the popularity of cameras in cities throughout the world, people have a higher requirement on the intelligent video surveillance. As one of the core parts, more and more researchers pay their attentions on multi-camera multi-object tracking. How to design a practical multi-camera multi-object tracking algorithm includes many steps, including object detection, single object tracking, single camera multi-object tracking, inter-camera multi-object tracking and so on. And in each step, there are also many problems. In this paper, we want to build a whole multi-camera multi-object tracking algorithm, and provide some improvements on the key problems in these steps. Our contributions are listed below:

(1) In multi-camera multi-object tracking, there are a huge number of detections, which would reduce the speed and accuracy of the data association process. Therefore, we first track every detection using a single object tracking method and get the initial tracklets, which can be used for further data association. However, current single object tracking method can't guarantee the tracking accuracy, and the produced wrong tracklets would cause a huge damage to the whole multi-camera multi-object tracking algorithm. In this case, we provide a higher accuracy single object tracking method using an adaptive combination of multiple features. Our method can adaptively select the most robust features based on the learned feature invariance to track objects. Meanwhile, we use the tracking accuracy to control the feature update, which partly solves the drift problem. In ICCV 2013 Visual Object Tracking Challenge (VOT2013), our method achieves the 4th place in 27 teams under the accuracy protocols.

(2) After obtaining the initial tracklets, most multi-camera multi-object tracking methods follow two steps: single camera multi-object tracking and inter-camera multi-object tracking, and solve them separately. In this case, if there're errors in single camera multi-object tracking, these errors would be magnified in inter-camera multi-object tracking, and have a huge damage to the whole algorithm. In this paper, we present an equalized global graph model-based approach for multi-camera object tracking, which integrates both single camera multi-object tracking and inter-camera multi-object tracking into one graph model and optimises them together. It can avoid the error magnification problem. Meanwhile, even today, there's no widely used datasets and evaluation criteria in multi-camera multi-object tracking. We provide a clearer evaluation criterion, and a more representative dataset based on the criteria.

(3) After building the whole multi-camera multi-object tracking system, we find the feature representation and matching are key problems to the final performance. As considered alone, these two problems can be treated as a person re-identification task. Many person re-identification methods solve it from a classification aspect or a ranking aspect. We analyze the relationship of two aspects and find that each of them has its own advantage and they are complementary. Therefore, we propose two ways to integrate two aspects: 1) first, we provide a multi-task deep network to optimize the classification task and the ranking task simultaneously; 2) we design a quadruplet loss, which contains the advantages of both tasks and partly avoid their weaknesses. Both of them achieve state-of-the-art results on current representative datasets in person re-identification.
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/20383
专题毕业生_博士学位论文
作者单位中国科学院自动化研究所
推荐引用方式
GB/T 7714
陈威华. 多摄像机视觉目标跟踪关键问题研究[D]. 北京. 中国科学院研究生院,2017.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
答辩后提交版本.pdf(29116KB)学位论文 暂不开放CC BY-NC-SA请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[陈威华]的文章
百度学术
百度学术中相似的文章
[陈威华]的文章
必应学术
必应学术中相似的文章
[陈威华]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。