Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception
Gao, Junyu1,2; Chen, Mengyuan1,2; Xu, Changsheng1,2,3
2023
会议名称IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
会议日期2022-06-18
会议地点Vancouver, Canada
摘要

With only video-level event labels, this paper targets at the task of weakly-supervised audio-visual event perception (WS-AVEP), which aims to temporally localize and categorize events belonging to each modality. Despite the recent progress, most existing approaches either ignore the unsynchronized property of audio-visual tracks or discount the complementary modality for explicit enhancement. We argue that, for an event residing in one modality, the modality itself should provide ample presence evidence of this event, while the other complementary modality is encouraged to afford the absence evidence as a reference signal. To this end, we propose to collect Cross-Modal Presence-Absence Evidence (CMPAE) in a unified framework. Specifically, by leveraging uni-modal and cross-modal representations, a presence-absence evidence collector (PAEC) is designed under Subjective Logic theory. To learn the evidence in a reliable range, we propose a joint-modal mutual learning (JML) process, which calibrates the evidence of diverse audible, visible, and audi-visible events adaptively and dynamically. Extensive experiments show that our method surpasses state-of-the-arts (e.g., absolute gains of $3.6\%$ and $6.1\%$ in terms of event-level visual and audio metrics). Code is available in github.com/MengyuanChen21/CVPR2023-CMPAE.

其他摘要

 

七大方向——子方向分类图像视频处理与分析
国重实验室规划方向分类小样本高噪声数据学习
是否有论文关联数据集需要存交
文献类型会议论文
条目标识符http://ir.ia.ac.cn/handle/173211/51577
专题多模态人工智能系统全国重点实验室
作者单位1.State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation, Chinese Academy of Sciences (CASIA)
2.School of Artificial Intelligence, University of Chinese Academy of Sciences (UCAS)
3.Peng Cheng Laboratory
第一作者单位中国科学院自动化研究所
推荐引用方式
GB/T 7714
Gao, Junyu,Chen, Mengyuan,Xu, Changsheng. Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception[C],2023.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Gao, Junyu]的文章
[Chen, Mengyuan]的文章
[Xu, Changsheng]的文章
百度学术
百度学术中相似的文章
[Gao, Junyu]的文章
[Chen, Mengyuan]的文章
[Xu, Changsheng]的文章
必应学术
必应学术中相似的文章
[Gao, Junyu]的文章
[Chen, Mengyuan]的文章
[Xu, Changsheng]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。