VAG: A Uniform Model for Cross-Modal Visual-Audio Mutual Generation | |
Hao, Wangli1,2![]() ![]() ![]() | |
发表期刊 | IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
![]() |
ISSN | 2162-237X |
2022-04-08 | |
页码 | 13 |
摘要 | Considering both audio and visual modalities is helpful for understanding a video. In the face of harsh environmental interference or signal packet loss, automatically compensating for audio and vision is a challenging task. We propose a dynamic cross-modal visual-audio mutual generation model (VAMG), which includes audio to visual conversion, visual to audio conversion, audio self-generation, and visual self-generation. VAMG jointly optimizes modal reconstruction and adversarial constraints, effectively solving the problems of structural alignment and signal compensation in incomplete videos. We conducted an instrument-oriented and pose-oriented cross-modal audio-visual mutual generation experiment on the sub-University of Rochester Musical Performance dataset to verify the effectiveness of the model. |
关键词 | Task analysis Instruments Visualization Image reconstruction Generators Decoding Generative adversarial networks Cross modality cross-modal generation mutual generation visual and audio |
DOI | 10.1109/TNNLS.2022.3161314 |
收录类别 | SCI |
语种 | 英语 |
资助项目 | Major Project for New Generation of AI[2018AAA0100400] ; National Natural Science Foundation of China[61836014] ; National Natural Science Foundation of China[U21B2042] ; National Natural Science Foundation of China[62072457] ; National Natural Science Foundation of China[62006231] ; InnoHK Program |
项目资助者 | Major Project for New Generation of AI ; National Natural Science Foundation of China ; InnoHK Program |
WOS研究方向 | Computer Science ; Engineering |
WOS类目 | Computer Science, Artificial Intelligence ; Computer Science, Hardware & Architecture ; Computer Science, Theory & Methods ; Engineering, Electrical & Electronic |
WOS记录号 | WOS:000782832800001 |
出版者 | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC |
七大方向——子方向分类 | 多模态智能 |
国重实验室规划方向分类 | 多模态协同认知 |
是否有论文关联数据集需要存交 | 否 |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://ir.ia.ac.cn/handle/173211/48358 |
专题 | 模式识别实验室 |
通讯作者 | Zhang, Zhaoxiang |
作者单位 | 1.Chinese Acad Sci CASIA, Ctr Res Intelligent Percept & Comp CRIPAC, Inst Automat, Natl Lab Pattern Recognit NLPR, Beijing 100190, Peoples R China 2.Univ Chinese Acad Sci UCAS, Beijing 100190, Peoples R China 3.Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China 4.Univ Chinese Acad Sci, Ctr Res Intelligent Percept & Comp, Inst Automat, Beijing 100190, Peoples R China 5.Chinese Acad Sci, Ctr Excellence Brain Sci & Intelligence Technol, Beijing 101408, Peoples R China |
第一作者单位 | 模式识别国家重点实验室 |
推荐引用方式 GB/T 7714 | Hao, Wangli,Guan, He,Zhang, Zhaoxiang. VAG: A Uniform Model for Cross-Modal Visual-Audio Mutual Generation[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS,2022:13. |
APA | Hao, Wangli,Guan, He,&Zhang, Zhaoxiang.(2022).VAG: A Uniform Model for Cross-Modal Visual-Audio Mutual Generation.IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS,13. |
MLA | Hao, Wangli,et al."VAG: A Uniform Model for Cross-Modal Visual-Audio Mutual Generation".IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2022):13. |
条目包含的文件 | 下载所有文件 | |||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | ||
MMVAG.pdf(37909KB) | 期刊论文 | 作者接受稿 | 开放获取 | CC BY-NC-SA | 浏览 下载 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论