With the development of computer vision and robotics techniques, it is of significance to track human targets on moving platforms in the field of unmanned aerial vehicle, human-robot interaction, nursing robot and service robot. Human following system consists of visual tracker on image planes and motion controller for moving robots. The core problem of tracking is how to detect and locate the object accurately in the changing scenario such as illumination variations, scale variations, occlusions, shape deformation, and camera motion. Meanwhile, tracking is a time-critical problem because it always performs on resource-constrained platforms. On the other hand, it is crucial to design controllers to produce smooth control command for moving robots. This paper mainly focuses on robust and fast visual tracking, as well as motion control towards human following robot. Main work and contributions are as follows:
（1） A scalable and occlusion-aware multi-cues correlation filter human tracker is proposed. By fusing depth with edge and color features, a stereo tracking algorithm named Scalable and Occlusion-aware Multi-cues Correlation Filter Tracker (SOMCFT) is proposed in this paper, which is mainly composed by multi-cue correlation filter tracker (MCFT), scale handling and occlusion-aware strategy. In MCFT, the confidence maps drawn from all the features are filtered by each other, and then an optimal confidence map is determined by minimizing the sum of Kullback-Leibler (KL) divergence. In scale handling, the target is segmented by 2D gray-depth histogram and then a quantized set is used to guide the scale variants of bounding boxes. In occlusion-aware strategy, the start and end of occlusion is detected by combining the changes of depth and the results of MCFT, meanwhile a reasonable candidate region is determined during occlusion. Both qualitative and quantitative evaluations on a Stereo Tracking Dataset (STD) demonstrate that the proposed algorithm performs favorably against compared methods. SOMCFT algorithm outperforms
the second ranked method by 14.59% in precision score.
（2） A fast and unified convolutional human tracker is proposed. Convolutional neural networks (CNN) based tracking approaches have shown favorable performance in recent benchmarks. Nonetheless, the chosen CNN features are always pre-trained in different task and individual components in tracking systems are learned separately, thus the achieved tracking performance and speed may be suboptimal. This paper proposes an end-to-end framework to learn the convolutional features and perform the tracking process simultaneously, namely, a unified convolutional tracker (UCT). Specifically, The UCT treats feature extractor and tracking process both as convolution operation and trains them jointly, enabling learned CNN features are tightly coupled to tracking process. In online tracking, an efficient updating method is proposed by introducing peak-versus-noise ratio (PNR) criterion, and scale changes are handled efficiently by incorporating a scale branch into network. The proposed approach results in superior tracking performance, while maintaining real-time speed. The standard UCT and UCT-lite can track objects at 41 FPS and 154 FPS without further optimization, respectively. Experiments achieves state-of-the-art results on tracking benchmarks compared with other real-time trackers.
（3） A high-accuracy human tracking algorithm is proposed by utilizing rich flow information in consecutive frames.
Most of existing CNN trackers only consider appearance features of current frame, and hardly benefit from motion and inter-frame information. The lack of temporal information degrades the tracking performance during challenges such as partial occlusion and deformation. This paper proposes the FlowTrack, which focuses on making use of the rich flow information in consecutive frames to improve the feature representation and the tracking accuracy. The FlowTrack formulates individual components, including optical flow estimation, feature extraction, aggregation and correlation filters tracking as special layers in network. Then the historical feature maps at predefined intervals are warped and aggregated with current ones by the guiding of flow. For adaptive aggregation, FlowTrack proposes a novel spatial-temporal attention mechanism. In experiments, the proposed method achieves leading performance on OTB2013, OTB2015, VOT2015, VOT2016 and PTZ datasets.
（4） A motion controller for stereo Pan-tilt platform is proposed. Aiming at the problem of tracking delay and large errors when conventional image-based visual servo is applied to tracking moving objects, a velocity compensation image-based visual servo (VC-IBVS) controller is proposed in this paper, which consists of a basic visual servo sub-controller and a velocity compensation sub-controller. The former is used to eliminate position error and the latter takes into account the target velocity. Corresponding Jacobian matrixes are derived to implement the controller. At the same time, a novel adaptive gain is designed to boost the control law and a strategy keeping velocity continuous is adopted to avoid abrupt changes. Extensive experiments are conducted and analyzed in a real binocular platform implemented with off-the-shelf setups, which demonstrate the effectiveness of the proposed method.
（5） A monocular-vision based human tracking and motion control system is built. Utilizing the human visual tracker in image planes and motion controller for moving robots, this paper implements a human following system. Specifically, for visual tracking, a Siamese-based tracker is proposed by using the flow information and regression networks, which can locate and regress the target simultaneously. Furthermore, the proposed tracker mines hard training samples during offline tracking, which can handle the distractor and occlusion in online tracking. For human following, a unified controller is derived for wheeled mobile robot with monocular pan-tilt camera, which can stay the target in the field of view and keep following simultaneously. In experiments, the proposed visual tracker significantly outperforms the state-of-the-arts in visual tracking benchmarks. Human following experiments are conducted and analysed in simulation and on a real moving robot platform simultaneously, which demonstrates the effectiveness and robustness of the overall system.