Weakly-supervised video object grounding via causal intervention
Wang, Wei1,2; Gao, Junyu1,2; Xu, Changsheng1,2,3
发表期刊IEEE Transactions on Pattern Analysis and Machine Intelligence
2023
卷号45期号:3页码:3933 - 3948
摘要

We target at the task of weakly-supervised video object grounding (WSVOG), where only video-sentence annotations are available during model learning. It aims to localize objects described in the sentence to visual regions in the video, which is a fundamental capability needed in pattern analysis and machine learning. Despite the recent progress, existing methods all suffer from the severe problem of spurious association, which will harm the grounding performance. In this paper, we start from the definition of WSVOG and pinpoint the spurious association from two aspects: (1) the association itself is not object-relevant but extremely ambiguous due to weak supervision; and (2) the association is unavoidably confounded by the observational bias when taking the statistics-based matching strategy in existing methods. With this in mind, we design a unified causal framework to learn the deconfounded object-relevant association for more accurate and robust video object grounding. Specifically, we learn the object-relevant association by causal intervention from the perspective of video data generation process. To overcome the problems of lacking fine-grained supervision in terms of intervention, we propose a novel spatial-temporal adversarial contrastive learning paradigm. To further remove the accompanying confounding effect within the object-relevant association, we pursue the true causality by conducting causal intervention via backdoor adjustment. Finally, the deconfounded object-relevant association is learned and optimized under a unified causal framework in an end-to-end manner. Extensive experiments on both IID and OOD testing sets of three benchmarks demonstrate its accurate and robust grounding performance against state-of-the-arts.

七大方向——子方向分类图像视频处理与分析
国重实验室规划方向分类小样本高噪声数据学习
是否有论文关联数据集需要存交
文献类型期刊论文
条目标识符http://ir.ia.ac.cn/handle/173211/51523
专题多模态人工智能系统全国重点实验室
作者单位1.National Lab of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences
2.School of Artifical Intelligence, University of Chinese Academy of Sciences
3.Peng Cheng Laboratory
第一作者单位中国科学院自动化研究所
推荐引用方式
GB/T 7714
Wang, Wei,Gao, Junyu,Xu, Changsheng. Weakly-supervised video object grounding via causal intervention[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2023,45(3):3933 - 3948.
APA Wang, Wei,Gao, Junyu,&Xu, Changsheng.(2023).Weakly-supervised video object grounding via causal intervention.IEEE Transactions on Pattern Analysis and Machine Intelligence,45(3),3933 - 3948.
MLA Wang, Wei,et al."Weakly-supervised video object grounding via causal intervention".IEEE Transactions on Pattern Analysis and Machine Intelligence 45.3(2023):3933 - 3948.
条目包含的文件 下载所有文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
bare_jrnl_compsoc.pd(4558KB)期刊论文作者接受稿开放获取CC BY-NC-SA浏览 下载
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Wang, Wei]的文章
[Gao, Junyu]的文章
[Xu, Changsheng]的文章
百度学术
百度学术中相似的文章
[Wang, Wei]的文章
[Gao, Junyu]的文章
[Xu, Changsheng]的文章
必应学术
必应学术中相似的文章
[Wang, Wei]的文章
[Gao, Junyu]的文章
[Xu, Changsheng]的文章
相关权益政策
暂无数据
收藏/分享
文件名: bare_jrnl_compsoc.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。