CASIA OpenIR  > 模式识别国家重点实验室
CMCGAN A Uniform Framework for CrossModal VisualAudio Mutual Generation
Wangli Hao1,2; Zhaoxiang Zhang1,2,3; He Guan1
Conference NameAAAI
Conference Date2018.2.1
Conference PlaceHilton New Orleans Riverside, American

Visual and audio modalities are two symbiotic modalities underlying videos, which contain both common and complementary information. If they can be mined and fused sufficiently, performances of related video tasks can be significantly enhanced. However, due to the environmental interference or sensor fault, sometimes, only one modality exists while the other is abandoned or missing. By recovering the
missing modality from the existing one based on the common information shared between them and the prior information of the specific modality, great bonus will be gained for various vision tasks. In this paper, we propose a Cross-Modal Cycle Generative Adversarial Network (CMCGAN) to handle cross-modal visual-audio mutual generation. Specifically, CMCGAN is composed of four kinds of subnetworks:
audio-to-visual, visual-to-audio, audio-to-audio and visualto-visual subnetworks respectively, which are organized in a cycle architecture. CMCGAN has several remarkable advantages. Firstly, CMCGAN unifies visual-audio mutual generation into a common framework by a joint corresponding adversarial loss. Secondly, through introducing a latent vector with Gaussian distribution, CMCGAN can handle dimension and structure asymmetry over visual and audio modalities effectively. Thirdly, CMCGAN can be trained end-to-end to achieve better convenience. Benefiting from CMCGAN, we develop a dynamic multimodal classification network to handle the modality missing problem. Abundant experiments have been conducted and validate that CMCGAN obtains the state-of-the-art cross-modal visual-audio generation results. Furthermore, it is shown that the generated modality achieves comparable effects with those of original modality, which demonstrates the effectiveness and advantages of our proposed method.


Document Type会议论文
Corresponding AuthorZhaoxiang Zhang
Affiliation1.Center of Research on Intelligent Perception and Computing
2.Institute of Automation, University of Chinese Academy of Sciences
3.Center for Excellence in Brain Science and Intelligence Technology (CEBSIT)
4.CAS Center for Excellence in Brain Science and Intelligence
Recommended Citation
GB/T 7714
Wangli Hao,Zhaoxiang Zhang,He Guan. CMCGAN A Uniform Framework for CrossModal VisualAudio Mutual Generation[C],2018.
Files in This Item: Download All
File Name/Size DocType Version Access License
2--CMCGAN A Uniform (2675KB)会议论文 开放获取CC BY-NC-SAView Download
Related Services
Recommend this item
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Wangli Hao]'s Articles
[Zhaoxiang Zhang]'s Articles
[He Guan]'s Articles
Baidu academic
Similar articles in Baidu academic
[Wangli Hao]'s Articles
[Zhaoxiang Zhang]'s Articles
[He Guan]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Wangli Hao]'s Articles
[Zhaoxiang Zhang]'s Articles
[He Guan]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: 2--CMCGAN A Uniform Framework for CrossModal VisualAudio Mutual Generation .pdf
Format: Adobe PDF
This file does not support browsing at this time
All comments (0)
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.