CASIA OpenIR  > 模式识别实验室
Everybody’s Talkin’: Let Me Talk as You Want
宋林森1,2; 吴文岩3; 钱晨3; 赫然1,2; Loy, Chen Change4
Source PublicationIEEE Transactions on Information Forensics and Security
ISSN1556-6013
2022-01-26
Volume17Issue:1Pages:585 - 598
Abstract

We present a method to edit a target portrait footage by taking a sequence of audio as input to synthesize a photo-realistic video. This method is unique because it is highly dynamic. It does not assume a person-specific rendering network yet capable of translating one source audio into one random chosen video output within a set of speech videos. Instead of learning a highly heterogeneous and nonlinear mapping from audio to the video directly, we first factorize each target video frame into orthogonal parameter spaces, i.e. , expression, geometry, and pose, via monocular 3D face reconstruction. Next, a recurrent network is introduced to translate source audio into expression parameters that are primarily related to the audio content. The audio-translated expression parameters are then used to synthesize a photo-realistic human subject in each video frame, with the movement of the mouth regions precisely mapped to the source audio. The geometry and pose parameters of the target human portrait are retained, therefore preserving the context of the original video footage. Finally, we introduce a novel video rendering network and a dynamic programming method to construct a temporally coherent and photo-realistic video. Extensive experiments demonstrate the superiority of our method over existing approaches. Our method is end-to-end learnable and robust to voice variations in the source audio.

KeywordTalking face generation Video generation GAN Audio dubbing
DOI10.1109/TIFS.2022.3146783
URL查看原文
Indexed BySCI
Language英语
IS Representative Paper
Sub direction classification图像视频处理与分析
planning direction of the national heavy laboratory视觉信息处理
Paper associated data
Citation statistics
Cited Times:25[WOS]   [WOS Record]     [Related Records in WOS]
Document Type期刊论文
Identifierhttp://ir.ia.ac.cn/handle/173211/52260
Collection模式识别实验室
Corresponding AuthorLoy, Chen Change
Affiliation1.中科院自动化所
2.中国科学院大学
3.商汤科技邮箱公司
4.南洋理工大学
First Author AffilicationInstitute of Automation, Chinese Academy of Sciences
Recommended Citation
GB/T 7714
宋林森,吴文岩,钱晨,等. Everybody’s Talkin’: Let Me Talk as You Want[J]. IEEE Transactions on Information Forensics and Security,2022,17(1):585 - 598.
APA 宋林森,吴文岩,钱晨,赫然,&Loy, Chen Change.(2022).Everybody’s Talkin’: Let Me Talk as You Want.IEEE Transactions on Information Forensics and Security,17(1),585 - 598.
MLA 宋林森,et al."Everybody’s Talkin’: Let Me Talk as You Want".IEEE Transactions on Information Forensics and Security 17.1(2022):585 - 598.
Files in This Item:
File Name/Size DocType Version Access License
Everybodys_Talkin_Le(15432KB)期刊论文作者接受稿开放获取CC BY-NC-SAView
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[宋林森]'s Articles
[吴文岩]'s Articles
[钱晨]'s Articles
Baidu academic
Similar articles in Baidu academic
[宋林森]'s Articles
[吴文岩]'s Articles
[钱晨]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[宋林森]'s Articles
[吴文岩]'s Articles
[钱晨]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: Everybodys_Talkin_Let_Me_Talk_as_You_Want.pdf
Format: Adobe PDF
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.