CASIA OpenIR  > 多模态人工智能系统全国重点实验室  > 视频内容安全
Towards More Flexible and Accurate Object Tracking with Natural Language: Algorithms and Benchmark
Wang, Xiao1; Shu, Xiujun1,2; Zhang, Zhipeng3; Jiang, Bo4; Wang, Yaowei1; Tian, Yonghong1,5; Wu, Feng1,6
Conference NameIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Conference Date2021-7
Conference PlaceVirtual

Tracking by natural language specification is a new rising research topic that aims at locating the target object in the video sequence based on its language description. Compared with traditional bounding box (BBox) based tracking, this setting guides object tracking with high-level seman- tic information, addresses the ambiguity of BBox, and links local and global search organically together. Those benefits may bring more flexible, robust and accurate tracking performance in practical scenarios. However, existing natural language initialized trackers are developed and compared on benchmark datasets proposed for tracking-by-BBox, which can’t reflect the true power of tracking-by-language. In this work, we propose a new benchmark specifically dedicated to the tracking-by language, including a large scale dataset, strong and diverse baseline methods. Specifically, we collect 2k video sequences (contains a total of 1,244,340 frames, 663 words) and split 1300/700 for the train/testing respectively. We densely annotate one sentence in English and corresponding bounding boxes of the target object for each video. We also introduce two new challenges into TNL2K for the object tracking task, i.e., adversarial samples and modality switch. A strong baseline method based on an adaptive local-global-search scheme is proposed for future works to compare. We believe this benchmark will greatly boost related researches on natural language guided tracking.

Indexed ByEI
Document Type会议论文
Affiliation1.Peng Cheng Laboratory
2.School of Electronic and Computer Engineering, Peking University
3.National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences
4.School of Computer Science and Technology, Anhui University
5.Department of Computer Science and Technology, Peking University
6.University of Science and Technology of China
Recommended Citation
GB/T 7714
Wang, Xiao,Shu, Xiujun,Zhang, Zhipeng,et al. Towards More Flexible and Accurate Object Tracking with Natural Language: Algorithms and Benchmark[C],2021.
Files in This Item: Download All
File Name/Size DocType Version Access License
TNL2K.pdf(5464KB)会议论文 开放获取CC BY-NC-SAView Download
Related Services
Recommend this item
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Wang, Xiao]'s Articles
[Shu, Xiujun]'s Articles
[Zhang, Zhipeng]'s Articles
Baidu academic
Similar articles in Baidu academic
[Wang, Xiao]'s Articles
[Shu, Xiujun]'s Articles
[Zhang, Zhipeng]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Wang, Xiao]'s Articles
[Shu, Xiujun]'s Articles
[Zhang, Zhipeng]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: TNL2K.pdf
Format: Adobe PDF
All comments (0)
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.