CASIA OpenIR  > 智能感知与计算研究中心
Audio-driven Dubbing for User Generated Contents via Style-aware Semi-parametric Synthesis
Song LS(宋林森)1,3; Wu WY(吴文岩)2; Fu CY(傅朝友)1,3; Loy, Chen Change4; He R(赫然)1,3
Source PublicationIEEE Transactions on Circuits and Systems for Video Technology
2022-09-26
Volume33Issue:3Pages:1247 - 1261
Abstract

Existing automated dubbing methods are usually designed for Professionally Generated Content (PGC) production, which requires massive training data and training time to learn a person-specific audio-video mapping. In this paper, we investigate an audio-driven dubbing method that is more feasible for User Generated Content (UGC) production. There are two unique challenges to design a method for UGC: 1) the appearances of speakers are diverse and arbitrary as the method needs to generalize across users; 2) the available video data of one speaker are very limited. In order to tackle the above challenges, we first introduce a new Style Translation Network to integrate the speaking style of the target and the speaking content of the source via a cross-modal AdaIN module. It enables our model to quickly adapt to a new speaker. Then, we further develop a semi-parametric video renderer, which takes full advantage of the limited training data of the unseen speaker via a video-level retrieve-warp-refine pipeline. Finally, we propose a temporal regularization for the semi-parametric renderer, generating more continuous videos. Extensive experiments show that our method generates videos that accurately preserve various speaking styles, yet with considerably lower amount of training data and training time in comparison to existing methods. Besides, our method achieves a faster testing speed than most recent methods.

KeywordTalking Face Generation Video Generation GAN Thin-plate Spline
MOST Discipline Catalogue工学 ; 工学::计算机科学与技术(可授工学、理学学位)
DOI10.1109/TCSVT.2022.3210002
URL查看原文
Indexed BySSCI
Language英语
Sub direction classification图像视频处理与分析
planning direction of the national heavy laboratory视觉信息处理
Paper associated data
Citation statistics
Cited Times:1[WOS]   [WOS Record]     [Related Records in WOS]
Document Type期刊论文
Identifierhttp://ir.ia.ac.cn/handle/173211/52261
Collection智能感知与计算研究中心
Corresponding AuthorHe R(赫然)
Affiliation1.中科院自动化所
2.北京商汤科技有限公司
3.中国科学院大学
4.南洋理工大学
First Author AffilicationInstitute of Automation, Chinese Academy of Sciences
Corresponding Author AffilicationInstitute of Automation, Chinese Academy of Sciences
Recommended Citation
GB/T 7714
Song LS,Wu WY,Fu CY,et al. Audio-driven Dubbing for User Generated Contents via Style-aware Semi-parametric Synthesis[J]. IEEE Transactions on Circuits and Systems for Video Technology,2022,33(3):1247 - 1261.
APA Song LS,Wu WY,Fu CY,Loy, Chen Change,&He R.(2022).Audio-driven Dubbing for User Generated Contents via Style-aware Semi-parametric Synthesis.IEEE Transactions on Circuits and Systems for Video Technology,33(3),1247 - 1261.
MLA Song LS,et al."Audio-driven Dubbing for User Generated Contents via Style-aware Semi-parametric Synthesis".IEEE Transactions on Circuits and Systems for Video Technology 33.3(2022):1247 - 1261.
Files in This Item: Download All
File Name/Size DocType Version Access License
Audio-Driven_Dubbing(8629KB)期刊论文作者接受稿开放获取CC BY-NC-SAView Download
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Song LS(宋林森)]'s Articles
[Wu WY(吴文岩)]'s Articles
[Fu CY(傅朝友)]'s Articles
Baidu academic
Similar articles in Baidu academic
[Song LS(宋林森)]'s Articles
[Wu WY(吴文岩)]'s Articles
[Fu CY(傅朝友)]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Song LS(宋林森)]'s Articles
[Wu WY(吴文岩)]'s Articles
[Fu CY(傅朝友)]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: Audio-Driven_Dubbing_for_User_Generated_Contents_via_Style-Aware_Semi-Parametric_Synthesis-2.pdf
Format: Adobe PDF
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.