CASIA OpenIR  > 模式识别实验室
Audio-driven Dubbing for User Generated Contents via Style-aware Semi-parametric Synthesis
Song LS(宋林森)1,3; Wu WY(吴文岩)2; Fu CY(傅朝友)1,3; Loy, Chen Change4; He R(赫然)1,3
发表期刊IEEE Transactions on Circuits and Systems for Video Technology
2022-09-26
卷号33期号:3页码:1247 - 1261
摘要

Existing automated dubbing methods are usually designed for Professionally Generated Content (PGC) production, which requires massive training data and training time to learn a person-specific audio-video mapping. In this paper, we investigate an audio-driven dubbing method that is more feasible for User Generated Content (UGC) production. There are two unique challenges to design a method for UGC: 1) the appearances of speakers are diverse and arbitrary as the method needs to generalize across users; 2) the available video data of one speaker are very limited. In order to tackle the above challenges, we first introduce a new Style Translation Network to integrate the speaking style of the target and the speaking content of the source via a cross-modal AdaIN module. It enables our model to quickly adapt to a new speaker. Then, we further develop a semi-parametric video renderer, which takes full advantage of the limited training data of the unseen speaker via a video-level retrieve-warp-refine pipeline. Finally, we propose a temporal regularization for the semi-parametric renderer, generating more continuous videos. Extensive experiments show that our method generates videos that accurately preserve various speaking styles, yet with considerably lower amount of training data and training time in comparison to existing methods. Besides, our method achieves a faster testing speed than most recent methods.

关键词Talking Face Generation Video Generation GAN Thin-plate Spline
学科门类工学 ; 工学::计算机科学与技术(可授工学、理学学位)
DOI10.1109/TCSVT.2022.3210002
URL查看原文
收录类别SSCI
语种英语
七大方向——子方向分类图像视频处理与分析
国重实验室规划方向分类视觉信息处理
是否有论文关联数据集需要存交
引用统计
被引频次:1[WOS]   [WOS记录]     [WOS相关记录]
文献类型期刊论文
条目标识符http://ir.ia.ac.cn/handle/173211/52261
专题模式识别实验室
通讯作者He R(赫然)
作者单位1.中科院自动化所
2.北京商汤科技有限公司
3.中国科学院大学
4.南洋理工大学
第一作者单位中国科学院自动化研究所
通讯作者单位中国科学院自动化研究所
推荐引用方式
GB/T 7714
Song LS,Wu WY,Fu CY,et al. Audio-driven Dubbing for User Generated Contents via Style-aware Semi-parametric Synthesis[J]. IEEE Transactions on Circuits and Systems for Video Technology,2022,33(3):1247 - 1261.
APA Song LS,Wu WY,Fu CY,Loy, Chen Change,&He R.(2022).Audio-driven Dubbing for User Generated Contents via Style-aware Semi-parametric Synthesis.IEEE Transactions on Circuits and Systems for Video Technology,33(3),1247 - 1261.
MLA Song LS,et al."Audio-driven Dubbing for User Generated Contents via Style-aware Semi-parametric Synthesis".IEEE Transactions on Circuits and Systems for Video Technology 33.3(2022):1247 - 1261.
条目包含的文件 下载所有文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
Audio-Driven_Dubbing(8629KB)期刊论文作者接受稿开放获取CC BY-NC-SA浏览 下载
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Song LS(宋林森)]的文章
[Wu WY(吴文岩)]的文章
[Fu CY(傅朝友)]的文章
百度学术
百度学术中相似的文章
[Song LS(宋林森)]的文章
[Wu WY(吴文岩)]的文章
[Fu CY(傅朝友)]的文章
必应学术
必应学术中相似的文章
[Song LS(宋林森)]的文章
[Wu WY(吴文岩)]的文章
[Fu CY(傅朝友)]的文章
相关权益政策
暂无数据
收藏/分享
文件名: Audio-Driven_Dubbing_for_User_Generated_Contents_via_Style-Aware_Semi-Parametric_Synthesis-2.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。