CASIA OpenIR  > 模式识别实验室
VAG: A Uniform Model for Cross-Modal Visual-Audio Mutual Generation
Hao, Wangli1,2; Guan, He1,3; Zhang, Zhaoxiang4,5
发表期刊IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
ISSN2162-237X
2022-04-08
页码13
摘要

Considering both audio and visual modalities is helpful for understanding a video. In the face of harsh environmental interference or signal packet loss, automatically compensating for audio and vision is a challenging task. We propose a dynamic cross-modal visual-audio mutual generation model (VAMG), which includes audio to visual conversion, visual to audio conversion, audio self-generation, and visual self-generation. VAMG jointly optimizes modal reconstruction and adversarial constraints, effectively solving the problems of structural alignment and signal compensation in incomplete videos. We conducted an instrument-oriented and pose-oriented cross-modal audio-visual mutual generation experiment on the sub-University of Rochester Musical Performance dataset to verify the effectiveness of the model.

关键词Task analysis Instruments Visualization Image reconstruction Generators Decoding Generative adversarial networks Cross modality cross-modal generation mutual generation visual and audio
DOI10.1109/TNNLS.2022.3161314
收录类别SCI
语种英语
资助项目Major Project for New Generation of AI[2018AAA0100400] ; National Natural Science Foundation of China[61836014] ; National Natural Science Foundation of China[U21B2042] ; National Natural Science Foundation of China[62072457] ; National Natural Science Foundation of China[62006231] ; InnoHK Program
项目资助者Major Project for New Generation of AI ; National Natural Science Foundation of China ; InnoHK Program
WOS研究方向Computer Science ; Engineering
WOS类目Computer Science, Artificial Intelligence ; Computer Science, Hardware & Architecture ; Computer Science, Theory & Methods ; Engineering, Electrical & Electronic
WOS记录号WOS:000782832800001
出版者IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
七大方向——子方向分类多模态智能
国重实验室规划方向分类多模态协同认知
是否有论文关联数据集需要存交
引用统计
被引频次:2[WOS]   [WOS记录]     [WOS相关记录]
文献类型期刊论文
条目标识符http://ir.ia.ac.cn/handle/173211/48358
专题模式识别实验室
通讯作者Zhang, Zhaoxiang
作者单位1.Chinese Acad Sci CASIA, Ctr Res Intelligent Percept & Comp CRIPAC, Inst Automat, Natl Lab Pattern Recognit NLPR, Beijing 100190, Peoples R China
2.Univ Chinese Acad Sci UCAS, Beijing 100190, Peoples R China
3.Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
4.Univ Chinese Acad Sci, Ctr Res Intelligent Percept & Comp, Inst Automat, Beijing 100190, Peoples R China
5.Chinese Acad Sci, Ctr Excellence Brain Sci & Intelligence Technol, Beijing 101408, Peoples R China
第一作者单位模式识别国家重点实验室
推荐引用方式
GB/T 7714
Hao, Wangli,Guan, He,Zhang, Zhaoxiang. VAG: A Uniform Model for Cross-Modal Visual-Audio Mutual Generation[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS,2022:13.
APA Hao, Wangli,Guan, He,&Zhang, Zhaoxiang.(2022).VAG: A Uniform Model for Cross-Modal Visual-Audio Mutual Generation.IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS,13.
MLA Hao, Wangli,et al."VAG: A Uniform Model for Cross-Modal Visual-Audio Mutual Generation".IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2022):13.
条目包含的文件 下载所有文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
MMVAG.pdf(37909KB)期刊论文作者接受稿开放获取CC BY-NC-SA浏览 下载
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Hao, Wangli]的文章
[Guan, He]的文章
[Zhang, Zhaoxiang]的文章
百度学术
百度学术中相似的文章
[Hao, Wangli]的文章
[Guan, He]的文章
[Zhang, Zhaoxiang]的文章
必应学术
必应学术中相似的文章
[Hao, Wangli]的文章
[Guan, He]的文章
[Zhang, Zhaoxiang]的文章
相关权益政策
暂无数据
收藏/分享
文件名: MMVAG.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。