CASIA OpenIR  > 模式识别国家重点实验室  > 语音交互
Improving Deep Neural Network Based Speech Synthesis through Contextual Feature Parametrization and Multi-Task Learning
Wen, Zhengqi1; Li, Kehuang2; Huang, Zhen2; Lee, Chin-Hui2; Tao, Jianhua1,3,4; Zhengqi Wen
2018-07-01
发表期刊JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY
卷号90期号:7页码:1025-1037
文章类型Article
摘要We propose three techniques to improve speech synthesis based on deep neural network (DNN). First, at the DNN input we use real-valued contextual feature vector to represent phoneme identity, part of speech and pause information instead of the conventional binary vector. Second, at the DNN output layer, parameters for pitch-scaled spectrum and aperiodicity measures are estimated for constructing the excitation signal used in our baseline synthesis vocoder. Third, the bidirectional recurrent neural network architecture with long short term memory (BLSTM) units is adopted and trained with multi-task learning for DNN-based speech synthesis. Experimental results demonstrate that the quality of synthesized speech has been improved by adopting the new input vector and output parameters. The proposed BLSTM architecture for DNN is also beneficial to learning the mapping function from the input contextual feature to the speech parameters and to improve speech quality.
关键词Dnn-based Speech Synthesis Vocoder Speech Parametrization Blstm Phoneme Embedded Vector Multi-task Learning Pitch-scaled Spectrum
WOS标题词Science & Technology ; Technology
DOI10.1007/s11265-017-1293-z
关键词[WOS]RECOGNITION ; REPRESENTATIONS ; DIVERGENCE ; EXTRACTION ; GENERATION ; SELECTION
收录类别SCI
语种英语
项目资助者National High-Tech Research and Development Program of China (863 Program)(2015AA016305) ; National Natural Science Foundation of China (NSFC)(61403386) ; Strategic Priority Research Program of the CAS(XDB02080006) ; Major Program for the National Social Science Fund of China(13ZD189)
WOS研究方向Computer Science ; Engineering
WOS类目Computer Science, Information Systems ; Engineering, Electrical & Electronic
WOS记录号WOS:000433555600007
引用统计
文献类型期刊论文
条目标识符http://ir.ia.ac.cn/handle/173211/19932
专题模式识别国家重点实验室_语音交互
通讯作者Zhengqi Wen
作者单位1.Natl Lab Pattern Recognit, Beijing, Peoples R China
2.Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA
3.Chinese Acad Sci, Inst Automat, CAS Ctr Excellence Brain Sci & Intelligence Techn, Beijing, Peoples R China
4.Univ Chinese Acad Sci, Sch Comp & Control Engn, Beijing, Peoples R China
推荐引用方式
GB/T 7714
Wen, Zhengqi,Li, Kehuang,Huang, Zhen,et al. Improving Deep Neural Network Based Speech Synthesis through Contextual Feature Parametrization and Multi-Task Learning[J]. JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY,2018,90(7):1025-1037.
APA Wen, Zhengqi,Li, Kehuang,Huang, Zhen,Lee, Chin-Hui,Tao, Jianhua,&Zhengqi Wen.(2018).Improving Deep Neural Network Based Speech Synthesis through Contextual Feature Parametrization and Multi-Task Learning.JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY,90(7),1025-1037.
MLA Wen, Zhengqi,et al."Improving Deep Neural Network Based Speech Synthesis through Contextual Feature Parametrization and Multi-Task Learning".JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY 90.7(2018):1025-1037.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
2017_Journal of Sina(995KB)期刊论文作者接受稿开放获取CC BY-NC-SA浏览 请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Wen, Zhengqi]的文章
[Li, Kehuang]的文章
[Huang, Zhen]的文章
百度学术
百度学术中相似的文章
[Wen, Zhengqi]的文章
[Li, Kehuang]的文章
[Huang, Zhen]的文章
必应学术
必应学术中相似的文章
[Wen, Zhengqi]的文章
[Li, Kehuang]的文章
[Huang, Zhen]的文章
相关权益政策
暂无数据
收藏/分享
文件名: 2017_Journal of Sinal Processing_SCI wenzhengqi.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。