Relative Alignment Network for Source-Free Multimodal Video Domain Adaptation
Huang Yi1,2; Yang Xiaoshan1,2,3; Zhang Ji4; Xu Changsheng1,2,3
2022-10
会议名称ACM International Conference on Multimedia
会议录名称MM '22: Proceedings of the 30th ACM International Conference on Multimedia
会议日期2022.10.10—2022.10.14
会议地点Lisboa, Portugal
摘要

Video domain adaptation aims to transfer knowledge from labeled source videos to unlabeled target videos. Existing video domain adaptation methods require full access to the source videos to reduce the domain gap between the source and target videos, which are impractical in real scenarios where the source videos are not available with concerns in transmission efficiency or privacy issues. To address this problem, in this paper, we propose to solve a source-free domain adaptation task for videos where only a pre-trained source model and unlabeled target videos are available for learning a multimodal video classification model. Existing source-free domain adaptation methods cannot be directly applied to this task, since videos always suffer from domain discrepancy along both the multimodal and temporal aspects, which brings difficulties in domain adaptation especially when the source data are unavailable. In this paper, we propose a Multimodal and Temporal Relative Alignment Network (MTRAN) to deal with the above challenges. To explicitly imitate the domain shifts contained in the multimodal information and the temporal dynamics of the source and target videos, we divide the target videos into two splits according to the self-entropy values of the classification results. The low-entropy videos are deemed to be source-like while the high-entropy videos are deemed to be target-like. Then, we adopt a self-entropy-guided MixUp strategy to generate synthetic samples and hypothetical samples as instance-level based on source-like and target-like videos, and push each synthetic sample to be similar with the corresponding hypothetical sample that is slightly closer to the source-like videos than the synthetic sample by multimodal and temporal relative alignment schemes. We evaluate the proposed model on four public video datasets. The results show that our model outperforms existing state-of-the-art methods.

收录类别EI
七大方向——子方向分类多模态智能
国重实验室规划方向分类多模态协同认知
是否有论文关联数据集需要存交
文献类型会议论文
条目标识符http://ir.ia.ac.cn/handle/173211/52094
专题多模态人工智能系统全国重点实验室
多模态人工智能系统全国重点实验室_多媒体计算
通讯作者Xu Changsheng
作者单位1.Institute of Automation, Chinese Academy of Sciences
2.School of Artificial Intelligence, University of Chinese Academy of Sciences
3.Peng Cheng Laboratory
4.DAMO Academy, Alibaba Group
第一作者单位中国科学院自动化研究所
通讯作者单位中国科学院自动化研究所
推荐引用方式
GB/T 7714
Huang Yi,Yang Xiaoshan,Zhang Ji,et al. Relative Alignment Network for Source-Free Multimodal Video Domain Adaptation[C],2022.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
Relative Alignment N(1264KB)会议论文 开放获取CC BY-NC-SA浏览
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Huang Yi]的文章
[Yang Xiaoshan]的文章
[Zhang Ji]的文章
百度学术
百度学术中相似的文章
[Huang Yi]的文章
[Yang Xiaoshan]的文章
[Zhang Ji]的文章
必应学术
必应学术中相似的文章
[Huang Yi]的文章
[Yang Xiaoshan]的文章
[Zhang Ji]的文章
相关权益政策
暂无数据
收藏/分享
文件名: Relative Alignment Network for Source-Free Multimodal Video Domain Adaptation.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。