Improving Deep Neural Network Based Speech Synthesis through Contextual Feature Parametrization and Multi-Task Learning | |
Wen, Zhengqi1; Li, Kehuang2; Huang, Zhen2; Lee, Chin-Hui2; Tao, Jianhua1,3,4; Zhengqi Wen | |
发表期刊 | JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY
![]() |
2018-07-01 | |
卷号 | 90期号:7页码:1025-1037 |
文章类型 | Article |
摘要 | We propose three techniques to improve speech synthesis based on deep neural network (DNN). First, at the DNN input we use real-valued contextual feature vector to represent phoneme identity, part of speech and pause information instead of the conventional binary vector. Second, at the DNN output layer, parameters for pitch-scaled spectrum and aperiodicity measures are estimated for constructing the excitation signal used in our baseline synthesis vocoder. Third, the bidirectional recurrent neural network architecture with long short term memory (BLSTM) units is adopted and trained with multi-task learning for DNN-based speech synthesis. Experimental results demonstrate that the quality of synthesized speech has been improved by adopting the new input vector and output parameters. The proposed BLSTM architecture for DNN is also beneficial to learning the mapping function from the input contextual feature to the speech parameters and to improve speech quality. |
关键词 | Dnn-based Speech Synthesis Vocoder Speech Parametrization Blstm Phoneme Embedded Vector Multi-task Learning Pitch-scaled Spectrum |
WOS标题词 | Science & Technology ; Technology |
DOI | 10.1007/s11265-017-1293-z |
关键词[WOS] | RECOGNITION ; REPRESENTATIONS ; DIVERGENCE ; EXTRACTION ; GENERATION ; SELECTION |
收录类别 | SCI |
语种 | 英语 |
项目资助者 | National High-Tech Research and Development Program of China (863 Program)(2015AA016305) ; National Natural Science Foundation of China (NSFC)(61403386) ; Strategic Priority Research Program of the CAS(XDB02080006) ; Major Program for the National Social Science Fund of China(13ZD189) |
WOS研究方向 | Computer Science ; Engineering |
WOS类目 | Computer Science, Information Systems ; Engineering, Electrical & Electronic |
WOS记录号 | WOS:000433555600007 |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://ir.ia.ac.cn/handle/173211/19932 |
专题 | 模式识别国家重点实验室_语音交互 |
通讯作者 | Zhengqi Wen |
作者单位 | 1.Natl Lab Pattern Recognit, Beijing, Peoples R China 2.Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA 3.Chinese Acad Sci, Inst Automat, CAS Ctr Excellence Brain Sci & Intelligence Techn, Beijing, Peoples R China 4.Univ Chinese Acad Sci, Sch Comp & Control Engn, Beijing, Peoples R China |
推荐引用方式 GB/T 7714 | Wen, Zhengqi,Li, Kehuang,Huang, Zhen,et al. Improving Deep Neural Network Based Speech Synthesis through Contextual Feature Parametrization and Multi-Task Learning[J]. JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY,2018,90(7):1025-1037. |
APA | Wen, Zhengqi,Li, Kehuang,Huang, Zhen,Lee, Chin-Hui,Tao, Jianhua,&Zhengqi Wen.(2018).Improving Deep Neural Network Based Speech Synthesis through Contextual Feature Parametrization and Multi-Task Learning.JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY,90(7),1025-1037. |
MLA | Wen, Zhengqi,et al."Improving Deep Neural Network Based Speech Synthesis through Contextual Feature Parametrization and Multi-Task Learning".JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY 90.7(2018):1025-1037. |
条目包含的文件 | 下载所有文件 | |||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | ||
2017_Journal of Sina(995KB) | 期刊论文 | 作者接受稿 | 开放获取 | CC BY-NC-SA | 浏览 下载 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论