CASIA OpenIR  > 模式识别国家重点实验室  > 语音交互
CTC Regularized Model Adaptation for Improving LSTM RNN Based Multi-Accent Mandarin Speech Recognition
Yi, Jiangyan1,2; Wen, Zhengqi1; Tao, Jianhua1,2,3; Ni, Hao1,2; Liu, Bin1; Wen ZQ(温正棋)
Source PublicationJOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY
2018-07-01
Volume90Issue:7Pages:985-997
SubtypeArticle
AbstractThis paper proposes a novel regularized adaptation method to improve the performance of multi-accent Mandarin speech recognition task. The acoustic model is based on long short term memory recurrent neural network trained with a connectionist temporal classification loss function (LSTM-RNN-CTC). In general, directly adjusting the network parameters with a small adaptation set may lead to over-fitting. In order to avoid this problem, a regularization term is added to the original training criterion. It forces the conditional probability distribution estimated from the adapted model to be close to the accent independent model. Meanwhile, only the accent-specific output layer needs to be fine-tuned using this adaptation method. Experiments are conducted on RASC863 and CASIA regional accented speech corpus. The results show that the proposed method obtains obvious improvement when compared with LSTM-RNN-CTC baseline model. It also outperforms other adaptation methods.
KeywordMulti-accent Mandarin Speech Recognition Lstm-rnn-ctc Model Adaptation Ctc Regularization
WOS HeadingsScience & Technology ; Technology
DOI10.1007/s11265-017-1291-1
Indexed BySCI
Language英语
Funding OrganizationNational High-Tech Research and Development Program of China (863 Program)(2015AA016305) ; National Natural Science Foundation of China (NSFC)(61425017 ; Strategic Priority Research Program of the CAS(XDB02080006) ; Major Program for the National Social Science Fund of China(13ZD189) ; 61403386 ; 61305003)
WOS Research AreaComputer Science ; Engineering
WOS SubjectComputer Science, Information Systems ; Engineering, Electrical & Electronic
WOS IDWOS:000433555600004
Citation statistics
Document Type期刊论文
Identifierhttp://ir.ia.ac.cn/handle/173211/19882
Collection模式识别国家重点实验室_语音交互
Corresponding AuthorWen ZQ(温正棋)
Affiliation1.Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing, Peoples R China
2.Univ Chinese Acad Sci, Sch Comp & Control Engn, Beijing, Peoples R China
3.Chinese Acad Sci, Inst Automat, CAS Ctr Excellence Brain Sci & Intelligence Techn, Beijing, Peoples R China
Recommended Citation
GB/T 7714
Yi, Jiangyan,Wen, Zhengqi,Tao, Jianhua,et al. CTC Regularized Model Adaptation for Improving LSTM RNN Based Multi-Accent Mandarin Speech Recognition[J]. JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY,2018,90(7):985-997.
APA Yi, Jiangyan,Wen, Zhengqi,Tao, Jianhua,Ni, Hao,Liu, Bin,&温正棋.(2018).CTC Regularized Model Adaptation for Improving LSTM RNN Based Multi-Accent Mandarin Speech Recognition.JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY,90(7),985-997.
MLA Yi, Jiangyan,et al."CTC Regularized Model Adaptation for Improving LSTM RNN Based Multi-Accent Mandarin Speech Recognition".JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY 90.7(2018):985-997.
Files in This Item: Download All
File Name/Size DocType Version Access License
10.1007_s11265-017-1(1416KB)期刊论文作者接受稿开放获取CC BY-NC-SAView Download
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Yi, Jiangyan]'s Articles
[Wen, Zhengqi]'s Articles
[Tao, Jianhua]'s Articles
Baidu academic
Similar articles in Baidu academic
[Yi, Jiangyan]'s Articles
[Wen, Zhengqi]'s Articles
[Tao, Jianhua]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Yi, Jiangyan]'s Articles
[Wen, Zhengqi]'s Articles
[Tao, Jianhua]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: 10.1007_s11265-017-1291-1.pdf
Format: Adobe PDF
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.