CASIA OpenIR  > 模式识别国家重点实验室  > 语音交互
Improving Deep Neural Network Based Speech Synthesis through Contextual Feature Parametrization and Multi-Task Learning
Wen, Zhengqi1; Li, Kehuang2; Huang, Zhen2; Lee, Chin-Hui2; Tao, Jianhua1,3,4; Zhengqi Wen
Source PublicationJOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY
2018-07-01
Volume90Issue:7Pages:1025-1037
SubtypeArticle
AbstractWe propose three techniques to improve speech synthesis based on deep neural network (DNN). First, at the DNN input we use real-valued contextual feature vector to represent phoneme identity, part of speech and pause information instead of the conventional binary vector. Second, at the DNN output layer, parameters for pitch-scaled spectrum and aperiodicity measures are estimated for constructing the excitation signal used in our baseline synthesis vocoder. Third, the bidirectional recurrent neural network architecture with long short term memory (BLSTM) units is adopted and trained with multi-task learning for DNN-based speech synthesis. Experimental results demonstrate that the quality of synthesized speech has been improved by adopting the new input vector and output parameters. The proposed BLSTM architecture for DNN is also beneficial to learning the mapping function from the input contextual feature to the speech parameters and to improve speech quality.
KeywordDnn-based Speech Synthesis Vocoder Speech Parametrization Blstm Phoneme Embedded Vector Multi-task Learning Pitch-scaled Spectrum
WOS HeadingsScience & Technology ; Technology
DOI10.1007/s11265-017-1293-z
WOS KeywordRECOGNITION ; REPRESENTATIONS ; DIVERGENCE ; EXTRACTION ; GENERATION ; SELECTION
Indexed BySCI
Language英语
Funding OrganizationNational High-Tech Research and Development Program of China (863 Program)(2015AA016305) ; National Natural Science Foundation of China (NSFC)(61403386) ; Strategic Priority Research Program of the CAS(XDB02080006) ; Major Program for the National Social Science Fund of China(13ZD189)
WOS Research AreaComputer Science ; Engineering
WOS SubjectComputer Science, Information Systems ; Engineering, Electrical & Electronic
WOS IDWOS:000433555600007
Citation statistics
Document Type期刊论文
Identifierhttp://ir.ia.ac.cn/handle/173211/19932
Collection模式识别国家重点实验室_语音交互
Corresponding AuthorZhengqi Wen
Affiliation1.Natl Lab Pattern Recognit, Beijing, Peoples R China
2.Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA
3.Chinese Acad Sci, Inst Automat, CAS Ctr Excellence Brain Sci & Intelligence Techn, Beijing, Peoples R China
4.Univ Chinese Acad Sci, Sch Comp & Control Engn, Beijing, Peoples R China
Recommended Citation
GB/T 7714
Wen, Zhengqi,Li, Kehuang,Huang, Zhen,et al. Improving Deep Neural Network Based Speech Synthesis through Contextual Feature Parametrization and Multi-Task Learning[J]. JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY,2018,90(7):1025-1037.
APA Wen, Zhengqi,Li, Kehuang,Huang, Zhen,Lee, Chin-Hui,Tao, Jianhua,&Zhengqi Wen.(2018).Improving Deep Neural Network Based Speech Synthesis through Contextual Feature Parametrization and Multi-Task Learning.JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY,90(7),1025-1037.
MLA Wen, Zhengqi,et al."Improving Deep Neural Network Based Speech Synthesis through Contextual Feature Parametrization and Multi-Task Learning".JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY 90.7(2018):1025-1037.
Files in This Item: Download All
File Name/Size DocType Version Access License
2017_Journal of Sina(995KB)期刊论文作者接受稿开放获取CC BY-NC-SAView Download
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Wen, Zhengqi]'s Articles
[Li, Kehuang]'s Articles
[Huang, Zhen]'s Articles
Baidu academic
Similar articles in Baidu academic
[Wen, Zhengqi]'s Articles
[Li, Kehuang]'s Articles
[Huang, Zhen]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Wen, Zhengqi]'s Articles
[Li, Kehuang]'s Articles
[Huang, Zhen]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: 2017_Journal of Sinal Processing_SCI wenzhengqi.pdf
Format: Adobe PDF
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.