A Multi-modal Global Instance Tracking Benchmark (MGIT): Better Locating Target in Complex Spatio-temporal and causal Relationship
Shiyu, Hu1,2; Dailing, Zhang1,2; Meiqi, Wu3; Xiaokun, Feng1,2; Xuchen, Li4; Xin, Zhao1,2; Kaiqi, Huang1,2,5
2023-12
会议名称the 37th Conference on Neural Information Processing Systems
会议日期2023-12
会议地点New Orleans
摘要

Tracking an arbitrary moving target in a video sequence is the foundation for high-level tasks like video understanding. Although existing visual-based trackers have demonstrated good tracking capabilities in short video sequences, they always perform poorly in complex environments, as represented by the recently proposed global instance tracking task, which consists of longer videos with more complicated narrative content. 
Recently, several works have introduced natural language into object tracking, desiring to address the limitations of relying only on a single visual modality. However, these selected videos are still short sequences with uncomplicated spatio-temporal and causal relationships, and the provided semantic descriptions are too simple to characterize video content. To address these issues, we (1) first propose a new multi-modal global instance tracking benchmark named MGIT. It consists of 150 long video sequences with a total of 2.03 million frames, aiming to fully represent the complex spatio-temporal and causal relationships coupled in longer narrative content.  (2) Each video sequence is annotated with three semantic grains (i.e., action, activity, and story) to model the progressive process of human cognition. We expect this multi-granular annotation strategy can provide a favorable environment for multi-modal object tracking research and long video understanding. (3) Besides, we execute comparative experiments on existing multi-modal object tracking benchmarks, which not only explore the impact of different annotation methods, but also validate that our annotation method is a feasible solution for coupling human understanding into semantic labels.  (4) Additionally, we conduct detailed experimental analyses on MGIT, and hope the explored performance bottlenecks of existing algorithms can support further research in multi-modal object tracking. 
The proposed benchmark, experimental results, and toolkit will be released gradually on  http://videocube.aitestunion.com/.

收录类别EI
七大方向——子方向分类目标检测、跟踪与识别
国重实验室规划方向分类智能能力评估
是否有论文关联数据集需要存交
文献类型会议论文
条目标识符http://ir.ia.ac.cn/handle/173211/54537
专题复杂系统认知与决策实验室_智能系统与工程
作者单位1.School of Artificial Intelligence, University of Chinese Academy of Sciences
2.Institute of Automation, Chinese Academy of Sciences
3.School of Computer Science and Technology, University of Chinese Academy of Sciences
4.School of Computer Science, Beijing University of Posts and Telecommunications
5.Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences
第一作者单位中国科学院自动化研究所
推荐引用方式
GB/T 7714
Shiyu, Hu,Dailing, Zhang,Meiqi, Wu,et al. A Multi-modal Global Instance Tracking Benchmark (MGIT): Better Locating Target in Complex Spatio-temporal and causal Relationship[C],2023.
条目包含的文件 下载所有文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
MGIT.pdf(6215KB)会议论文 开放获取CC BY-NC-SA浏览 下载
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Shiyu, Hu]的文章
[Dailing, Zhang]的文章
[Meiqi, Wu]的文章
百度学术
百度学术中相似的文章
[Shiyu, Hu]的文章
[Dailing, Zhang]的文章
[Meiqi, Wu]的文章
必应学术
必应学术中相似的文章
[Shiyu, Hu]的文章
[Dailing, Zhang]的文章
[Meiqi, Wu]的文章
相关权益政策
暂无数据
收藏/分享
文件名: MGIT.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。