CASIA OpenIR  > 模式识别国家重点实验室  > 语音交互
On The Application and Compression of Deep Time Delay Neural Network for Embedded Statistical Parametric Speech Synthesis
Yibin Zheng1,2; Jianhua Tao1,2; Zhengqi Wen1; Ruibo Fu1,2
2018-09
Conference NameAnnual Conference of the International Speech Communication Association-Interspeech
Conference Date2-6 September 2018
Conference PlaceHyderabad
Abstract

Acoustic models based on long short-term memory (LSTM)  recurrent neural networks (RNNs) were applied to statistical parametric speech synthesis (SPSS) and shown significant improvements. However, the model complexity and inference time cost of RNNs are much higher than feed-forward neural networks (FNN) due to the sequential nature of the learning algorithm,thus limiting its usage in many runtime applications. In this paper, we explore a novel application of deep time delay neural network (TDNN) for embedded SPSS, which requires low disk footprint, memory and latency. The TDNN could model long short-term temporal dependencies with inference cost comparable to standard FNN. Temporal subsampling enabled by TDNN could reduce computational complexity. Then we compress deep TDNN using singular value decomposition (SVD) to further reduce model complexity, which are motivated by the goal of building embedded SPSS systems which can be run efficiently on mobile devices. Both objective and subjective experimental 
results show that, the proposed deep TDNN with SVD compression could generate synthesized speech with better speech quality than FNN and comparable speech quality to LSTM,  while drastically reduce model complexity and speech parameter generation time. 

Indexed ByEI
Language英语
Document Type会议论文
Identifierhttp://ir.ia.ac.cn/handle/173211/23583
Collection模式识别国家重点实验室_语音交互
Affiliation1.National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences
2.School of Computer and Control Engineering, University of Chinese Academy of Sciences
First Author AffilicationInstitute of Automation, Chinese Academy of Sciences
Recommended Citation
GB/T 7714
Yibin Zheng,Jianhua Tao,Zhengqi Wen,et al. On The Application and Compression of Deep Time Delay Neural Network for Embedded Statistical Parametric Speech Synthesis[C],2018.
Files in This Item: Download All
File Name/Size DocType Version Access License
On The Application a(531KB)会议论文 开放获取CC BY-NC-SAView Download
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Yibin Zheng]'s Articles
[Jianhua Tao]'s Articles
[Zhengqi Wen]'s Articles
Baidu academic
Similar articles in Baidu academic
[Yibin Zheng]'s Articles
[Jianhua Tao]'s Articles
[Zhengqi Wen]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Yibin Zheng]'s Articles
[Jianhua Tao]'s Articles
[Zhengqi Wen]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: On The Application and Compression of Deep Time Delay Neural Network for Embedded Statistical Parametric Speech Synthesis.pdf
Format: Adobe PDF
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.