无边界效应的相关模型视觉跟踪方法研究
郑林宇
2021-05
页数150
学位类型博士
中文摘要

随着移动互联网的不断发展以及智能终端的快速普及,图像和视频数据呈现出爆炸式的增长。面向海量的视频数据,准确高效地进行语义解析以提取人们感兴趣的信息具有重要的现实意义。视觉目标跟踪通过对视频序列中感兴趣的目标物体进行定位,能够为高级视频分析任务提供必要的基础信息。因此,面向高精度和高效率的视觉目标跟踪方法的研究具有重要的学术意义和应用价值。

本文着重研究近年来在定位精度和运行速度上取得了较好平衡的两类视觉目标跟踪方法:基于相关滤波的方法和基于全卷积孪生网络的方法。由于这两类方法中的跟踪器模型都以相关运算的方式对样本进行评估,因此本文统称它们为基于相关模型的视觉目标跟踪方法。

基于相关滤波的方法通过在线训练判别式模型以区分目标物体及其周围背景。基于全卷积孪生网络的方法通过离线训练相似性度量卷积神经网络以在线执行相似性匹配。尽管它们取得了较高的性能,但在实际复杂多变的目标跟踪场景中仍然存在一些问题。一方面,由于定位精度受边界效应影响,基于相关滤波的目标跟踪方法难以在目标物体发生剧烈运动时进行鲁棒的定位。虽然当前已有一些方法通过缓解边界效应提升跟踪器的定位精度,但由于跟踪性能受建模方法的限制,它们难以同时实现高精度和高效率的目标跟踪。另一方面,虽然基于全卷积孪生网络的目标跟踪方法不受边界效应的影响,但由于缺少目标模板的在线更新,这类方法对目标物体的显著形变鲁棒性较差。针对上述问题,本文以无边界效应的相关模型为基础,通过设计鲁棒和高效的建模方法以及特征学习方法,一方面在消除基于相关滤波的目标跟踪方法的边界效应的同时,进一步提升跟踪器的定位精度和运行速度,另一方面提升基于全卷积孪生网络的目标跟踪方法对目标物体显著形变的鲁棒性。本文的主要研究成果和贡献归纳如下:

1.针对基于相关滤波的目标跟踪方法无法在缓解边界效应的同时利用核技巧进一步提升定位精度的问题,提出了一种基于高斯过程回归的目标跟踪方法。通过将高斯过程回归引入到基于在线判别式模型训练的目标跟踪方法中,对稠密采样的真实样本进行建模,从而使跟踪器不但不受边界效应的影响,而且可以利用核技巧提升定位精度。进一步,通过设计高效的在线模型更新方法,实现在视频序列中对目标物体精确和鲁棒的定位。在公开数据集上的实验结果表明,所提出的方法在手工特征下取得了同时期领先的跟踪精度。此外,提出了一种基于高斯过程回归的目标尺度估计方法。通过利用高斯过程回归在尺度空间对目标物体进行建模,实现了准确高效的目标尺度估计。

2.针对基于相关滤波的目标跟踪方法无法在缓解边界效应的同时高效地利用高维的卷积神经网络特征的问题,提出了一种无边界效应的快速核相关滤波方法。通过使用核岭回归模型对稠密采样的真实样本进行建模,得到无边界效应的核相关滤波方法。进一步,通过设计一种“建表-查表”方法,显著减少无边界效应的核相关滤波方法中矩阵构建过程中的冗余计算,从而加速核矩阵的构建以及滤波器模型的训练。同时期同类方法中滤波器模型训练的时间复杂度是特征通道数的二次增长函数,而所提出方法的时间复杂度是特征通道数的线性增长函数。因此,所提出的跟踪器既不受边界效应的影响,又可以高效地利用高维的卷积神经网络特征,最终同时获得了高的定位精度和实时的运行速度。

3.具有弱边界效应(或无边界效应)的相关滤波跟踪器普遍使用为图像分类任务训练的卷积神经网络提取特征。然而,由于目标跟踪任务和图像分类任务间存在明显的差异,上述特征空间对目标跟踪任务并不是最优的。针对这一问题,提出了一种基于特征空间学习的无边界效应相关滤波方法。通过将具有可微闭式解的岭回归模型求解器整合到卷积神经网络的离线训练中,利用大量的离线目标跟踪数据驱动卷积神经网络学习更适合基于无边界效应相关滤波的目标跟踪方法的特征空间。在在线跟踪中,用所学习的特征空间替代此前普遍使用的基于图像分类任务训练的特征空间,从特征表达上提升跟踪器的鲁棒性,进而提升定位精度。实验结果表明,所提出的方法在当前所有主流的目标跟踪数据集上都取得了领先的定位精度和实时的运行速度。

4.针对基于全卷积孪生网络的目标跟踪方法对目标物体的显著形变鲁棒性较差的问题,提出了一种基于可变形交叉相关的目标跟踪方法。通过将可变形交叉相关操作以元学习的方式引入到基于全卷积孪生网络的目标跟踪方法的模板匹配过程中,实现在线自适应的可变形模板匹配,从而提升跟踪器对目标物体显著形变的鲁棒性,在目标物体相对初始模板发生显著形变时也能得到较高的相似性匹配值。实现结果表明,所提出的方法不但具有实时的运行速度,而且相比同类方法,在目标物体发生显著形变时可以获得更鲁棒的定位。

英文摘要

With the development of mobile Internet and the rapid popularization of intelligent terminals, images and videos show explosive growth. It is of practical significance to accurately and efficiently extract information that people are interested in from images and videos. By locating target objects of interest in videos, visual object tracking can provide advanced video analysis tasks with the necessary basic information. Therefore, the research on high-accuracy and high-efficiency visual object tracking methods has important academic significance and practical value.

This dissertation focuses on the research of two kinds of visual object tracking methods which have achieved good balance between localization accuracy and running speed in recent years. They are correlation filters-based trackers (CF trackers) and fully-convolutional Siamese networks-based trackers (Siamese trackers). Since all tracker models in these two kinds of methods evaluate the samples in the way of correlation operation, these methods are collectively referred to as correlation models-based visual tracking methods in this dissertation.

CF trackers train discriminative models online to distinguish the target object from its surrounding backgrounds. Siamese trackers train convolutional neural networks (CNNs) offline to measure the similarity between the template and candidates. Although they have achieved high performance, there are still some problems and challenges in the complex and ever-changing tracking scenes. On the one hand, because the localization accuracy is affected negatively by the boundary effect, it is difficult for CF trackers to robustly locate the target object which is in dramatic motion. Although there have been some methods to improve the localization accuracy of CF trackers by reducing the boundary effect, it is difficult for them to achieve both high-accuracy and high-efficiency tracking, because their performance is limited by their modeling methods. On the other hand, although Siamese trackers are not affected by the boundary effect, they are less robust when the target undergoes significant deformation due to the lack of online adaptive update of the target template.

To address the above problems, based on the correlation models without boundary effect, this paper designs robust and efficient modeling methods as well as feature learning one to (1) eliminate the boundary effect of CF trackers, in the meanwhile, improving their localization accuracy and running speed, and (2) improve the robustness to the significant deformation of the target for Siamese trackers. Specifically, the main contributions of this paper are summarized as follows:

1. This dissertation proposes a Gaussian process regression-based tracking method to address the problem that CF trackers cannot reduce the boundary effect while employing kernel tricks to improve their localization accuracy. By introducing Gaussian process regression into the online discriminative tracking method, exploiting dense and real samples, the tracker is not only unaffected by the boundary effect, but also able to take advantage of kernel tricks to improve its localization accuracy. Furthermore, by designing two efficient methods to update the tracker online, accurate and robust tracking in sequences is achieved. Experimental results on public benchmarks show that under hand-crafted features, the proposed method achieves state-of-the-art accuracy. In addition, based on Gaussian process regression, a scale estimation method is proposed. By using Gaussian process regression to model the target object in scale space, accurate and efficient scale estimation is achieved.

2. This dissertation proposes a fast kernelized correlation filters method without boundary effect to address the problem that CF trackers cannot reduce the boundary effect while efficiently employing the high-dimensional CNN features. By using the kernel ridge regression to model the dense and real samples, a kernelized correlation filters without boundary effect is developed. Furthermore, by designing a method of building and looking-up table, the redundant calculation in the construction of kernel matrix is significantly reduced, so as to accelerate the construction of kernel matrix and the training of filters. Compared to the contemporaneous similar methods whose time complexity of training filters are of quadratical growth with respect to the number of feature channels, the time complexity of the proposed method grows linearly. As a result, the proposed tracker is not only unaffected by the boundary effect, but also able to efficiently employ high-dimensional CNN features, capable of obtaining both high localization accuracy and real-time running speed.

3. The trackers based on reducing the boundary effect of correlation filters generally extract features with the CNNs trained for the image classification task. However, due to the obvious difference between the visual object tracking task and the image classification one, the above feature embedding is not optimal for the visual tracking task. To address this problem, this dissertation proposes an architecture to learn optimal feature embeddings for the correlation filters-based tracking method. By integrating the solver of ridge regression model, which is differentiable and has a closed-form solution, into the offline training of CNNs, a mass of offline training data is employed to drive the CNNs to learn the feature embedding that is optimal for the CF trackers without boundary effect. In online tracking, the learned feature embeddings is used to replace the commonly used ones that are trained for the image classification task, improving the robustness of the tracker model from the perspective of feature expression, thereby improving the localization accuracy. Experimental results show that the proposed method achieves state-of-the-art accuracy and real-time running speed on all mainstream tracking benchmarks.

4. This dissertation proposes a Siamese deformable cross-correlation networks for visual tracking to address the problem that Siamese trackers are less robust to the significant deformation of targets. By introducing the deformable cross-correlation operation into the template matching of Siamese trackers in a meta-learning manner, online adaptive deformable template matching is realized, improving the robustness of tracker model to the significant deformation of targets. Even when the target object is significantly deformed relative to its initial template, a high similarity score can still be obtained. Experimental results show that the proposed method can not only run at real-time speed, but also perform robust localization when the target undergoes significant deformation.

关键词目标跟踪,尺度估计,相关滤波,边界效应,高斯过程回归,卷积神经网络,孪生网络,可变形交叉相关,特征空间学习
语种中文
七大方向——子方向分类目标检测、跟踪与识别
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/44883
专题紫东太初大模型研究中心_图像与视频分析
通讯作者郑林宇
推荐引用方式
GB/T 7714
郑林宇. 无边界效应的相关模型视觉跟踪方法研究[D]. 中国科学院大学自动化研究所. 中国科学院大学自动化研究所,2021.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
完整版-签字.pdf(19166KB)学位论文 开放获取CC BY-NC-SA
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[郑林宇]的文章
百度学术
百度学术中相似的文章
[郑林宇]的文章
必应学术
必应学术中相似的文章
[郑林宇]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。