面向人体目标跟随的视觉跟踪方法研究

CASIA OpenIR > 毕业生 > 博士学位论文

	面向人体目标跟随的视觉跟踪方法研究
	朱政
	2019-05-30
页数	154
学位类型	博士
中文摘要	随着计算机视觉和机器人技术的不断发展，为机器人配备视觉传感器，使其具备智能的信息处理能力、高效的决策能力和鲁棒精准的控制能力，进而实现具备快速准确的目标跟踪能力和鲁棒稳定的目标跟随运动控制能力的人体目标跟随机器人系统，将会极大地增强机器人的智能化水平，在军事领域、无人机、人机交互、服务机器人等方面有重要的应用。实现人体目标跟随机器人需要对两方面关键技术进行研究：一是针对目标跟随运动控制的实时性和鲁棒性需求，在视觉信息处理方面研究实时鲁棒的人体目标跟踪算法，为机器人运动控制提供高效的目标信息反馈；二是结合目标运动特点和视觉跟踪反馈的目标信息属性，在控制方面设计精准稳定的目标跟随运动控制律，进而实现机器人的稳定平滑跟随运动。本文围绕面向人体目标跟随的视觉跟踪方法开展研究，重点突破具备高度鲁棒和快速实时特点的图像平面人体目标跟踪算法，以及能够对运动目标进行准确跟随的机器人控制算法，进而实现面向人体目标的跟随机器人系统。论文主要的工作和贡献有：（1）提出了一种基于多特征相关滤波的人体目标鲁棒跟踪算法。针对传统相关滤波跟踪算法仅利用单一特征所导致的鲁棒性不强问题，融合双目立体视觉提供的深度信息，提出了一种具备可变尺度处理能力和遮挡感知处理能力的多特征相关滤波跟踪算法。算法主要由多特征相关滤波跟踪器、尺度处理模块和遮挡感知处理模块组成。在多特征相关滤波跟踪器模块中，来自不同特征的响应图在KL散度的优化框架下进行融合。在尺度处理模块中，首先对人体目标进行分割，然后进行量化处理得到当前尺度。在遮挡感知处理模块中，结合深度图分布和相关滤波模块的处理结果，对遮挡状态的发生和结束进行判断，并进行遮挡阶段的处理。该算法在构建的双目跟踪数据集上的精度分数比第二名提升了14.59%。（2）提出了一种基于端到端卷积网络的人体目标快速跟踪算法。针对传统基于卷积神经网络的跟踪算法对各个模块单独优化所导致的精度下降、运行速度缓慢问题，提出了统一的卷积跟踪算法。算法采用能够同时进行特征提取和视觉跟踪的端到端网络，利用随机梯度下降得到跟踪网络的参数，并在训练阶段进行联合训练优化；在线跟踪过程中，利用信噪比指标来评价跟踪质量，并据此指导模型的自适应更新；另外采用了一个一维的相关滤波分支进行人体目标尺度的估计。在实验中，该算法的标准版本和轻量版本分别可以以41 FPS和154 FPS的平均速度在OTB、VOT、PTZ数据集上取得良好的性能。（3）提出了一种基于光流信息的人体目标高精度跟踪算法。针对传统跟踪算法只考虑当前帧目标的表观特征而导致在复杂的人体目标跟踪场景中性能严重下降的问题，提出了一种端到端的光流相关滤波跟踪算法。该算法利用视频中连续帧之间的光流信息来同时提升特征表示和跟踪精度，将光流估计、特征提取和融合、相关滤波跟踪统一在端到端的网络，在训练阶段进行联合优化。历史帧特征通过光流信息映射到当前帧并和当前帧的特征进行融合，为了得到自适应的融合系数，算法设计了一种新颖的空间-时间注意力机制，避免了直接进行特征融合所导致的算法性能下降。在实验中，该算法在OTB2013、OTB2015、VOT2015、VOT2016、PTZ五个数据集上取得了领先的性能。（4）提出了一种基于速度补偿的机器人跟随运动控制算法。针对传统视觉伺服方法在跟踪运动目标时会产生稳态误差、收敛时间长、容易造成初始阶段运动平台抖动的问题，提出了一种考虑目标速度补偿的图像视觉伺服方法。算法由一个基本的图像视觉伺服子控制器和一个速度补偿子控制器组成，前者用来消除跟踪过程中的位置误差，而后者用于补偿目标本身运动速度。通过推导得出控制器雅克比矩阵，设计了一种增益系数自适应调整策略来加快控制器的收敛速度，并通过引入速度连续调节机制避免跟踪初始阶段的机器人抖动现象。在实验中，该算法能够高效平滑地完成对运动目标的跟随，有效地减小了稳态误差。（5）综合图像平面的人体目标跟踪技术和机器人跟随运动控制方法，实现了面向人体目标的移动机器人跟随系统。以基于光流信息的人体目标高精度跟踪算法为基础，提出了结合回归网络，对人体目标进行分类和回归；提出了通过挖掘孪生网络结构中的难例样本，使得结果分数更好地反映跟踪质量。针对移动机器人目标跟随运动控制问题，以实现人体以固定尺度保持在视野中央的Pan-tilt平台与机器人的统一控制为目的，设计了单目Pan-tilt相机和轮式机器人平台的控制律，实现了对人体目标鲁棒平滑的跟随。该系统能够在室内和室外不同场景中鲁棒平稳地完成对人体目标的跟随。
英文摘要	With the development of computer vision and robotics techniques, it is of significance to track human targets on moving platforms in the field of unmanned aerial vehicle, human-robot interaction, nursing robot and service robot. Human following system consists of visual tracker on image planes and motion controller for moving robots. The core problem of tracking is how to detect and locate the object accurately in the changing scenario such as illumination variations, scale variations, occlusions, shape deformation, and camera motion. Meanwhile, tracking is a time-critical problem because it always performs on resource-constrained platforms. On the other hand, it is crucial to design controllers to produce smooth control command for moving robots. This paper mainly focuses on robust and fast visual tracking, as well as motion control towards human following robot. Main work and contributions are as follows: （1） A scalable and occlusion-aware multi-cues correlation filter human tracker is proposed. By fusing depth with edge and color features, a stereo tracking algorithm named Scalable and Occlusion-aware Multi-cues Correlation Filter Tracker (SOMCFT) is proposed in this paper, which is mainly composed by multi-cue correlation filter tracker (MCFT), scale handling and occlusion-aware strategy. In MCFT, the confidence maps drawn from all the features are filtered by each other, and then an optimal confidence map is determined by minimizing the sum of Kullback-Leibler (KL) divergence. In scale handling, the target is segmented by 2D gray-depth histogram and then a quantized set is used to guide the scale variants of bounding boxes. In occlusion-aware strategy, the start and end of occlusion is detected by combining the changes of depth and the results of MCFT, meanwhile a reasonable candidate region is determined during occlusion. Both qualitative and quantitative evaluations on a Stereo Tracking Dataset (STD) demonstrate that the proposed algorithm performs favorably against compared methods. SOMCFT algorithm outperforms the second ranked method by 14.59% in precision score. （2） A fast and unified convolutional human tracker is proposed. Convolutional neural networks (CNN) based tracking approaches have shown favorable performance in recent benchmarks. Nonetheless, the chosen CNN features are always pre-trained in different task and individual components in tracking systems are learned separately, thus the achieved tracking performance and speed may be suboptimal. This paper proposes an end-to-end framework to learn the convolutional features and perform the tracking process simultaneously, namely, a unified convolutional tracker (UCT). Specifically, The UCT treats feature extractor and tracking process both as convolution operation and trains them jointly, enabling learned CNN features are tightly coupled to tracking process. In online tracking, an efficient updating method is proposed by introducing peak-versus-noise ratio (PNR) criterion, and scale changes are handled efficiently by incorporating a scale branch into network. The proposed approach results in superior tracking performance, while maintaining real-time speed. The standard UCT and UCT-lite can track objects at 41 FPS and 154 FPS without further optimization, respectively. Experiments achieves state-of-the-art results on tracking benchmarks compared with other real-time trackers. （3） A high-accuracy human tracking algorithm is proposed by utilizing rich flow information in consecutive frames. Most of existing CNN trackers only consider appearance features of current frame, and hardly benefit from motion and inter-frame information. The lack of temporal information degrades the tracking performance during challenges such as partial occlusion and deformation. This paper proposes the FlowTrack, which focuses on making use of the rich flow information in consecutive frames to improve the feature representation and the tracking accuracy. The FlowTrack formulates individual components, including optical flow estimation, feature extraction, aggregation and correlation filters tracking as special layers in network. Then the historical feature maps at predefined intervals are warped and aggregated with current ones by the guiding of flow. For adaptive aggregation, FlowTrack proposes a novel spatial-temporal attention mechanism. In experiments, the proposed method achieves leading performance on OTB2013, OTB2015, VOT2015, VOT2016 and PTZ datasets. （4） A motion controller for stereo Pan-tilt platform is proposed. Aiming at the problem of tracking delay and large errors when conventional image-based visual servo is applied to tracking moving objects, a velocity compensation image-based visual servo (VC-IBVS) controller is proposed in this paper, which consists of a basic visual servo sub-controller and a velocity compensation sub-controller. The former is used to eliminate position error and the latter takes into account the target velocity. Corresponding Jacobian matrixes are derived to implement the controller. At the same time, a novel adaptive gain is designed to boost the control law and a strategy keeping velocity continuous is adopted to avoid abrupt changes. Extensive experiments are conducted and analyzed in a real binocular platform implemented with off-the-shelf setups, which demonstrate the effectiveness of the proposed method. （5） A monocular-vision based human tracking and motion control system is built. Utilizing the human visual tracker in image planes and motion controller for moving robots, this paper implements a human following system. Specifically, for visual tracking, a Siamese-based tracker is proposed by using the flow information and regression networks, which can locate and regress the target simultaneously. Furthermore, the proposed tracker mines hard training samples during offline tracking, which can handle the distractor and occlusion in online tracking. For human following, a unified controller is derived for wheeled mobile robot with monocular pan-tilt camera, which can stay the target in the field of view and keep following simultaneously. In experiments, the proposed visual tracker significantly outperforms the state-of-the-arts in visual tracking benchmarks. Human following experiments are conducted and analysed in simulation and on a real moving robot platform simultaneously, which demonstrates the effectiveness and robustness of the overall system.
关键词	人体目标跟踪相关滤波跟踪卷积神经网络深度学习人体目标检测视觉伺服机器人控制
语种	中文
七大方向——子方向分类	目标检测、跟踪与识别
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/23832
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	朱政. 面向人体目标跟随的视觉跟踪方法研究[D]. 中国科学院自动化研究所. 中国科学院大学,2019.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
Thesis.pdf（15611KB）	学位论文		限制开放	CC BY-NC-SA