CASIA OpenIR  > 数字内容技术与服务研究中心  > 听觉模型与认知计算
Syllable-Based Acoustic Modeling with CTC for Multi-Scenarios Mandarin speech recognition
Zhao YY(赵媛媛)1,2; Linhao Dong1,2; Shuang Xu1; Bo Xu1; Yuanyuan Zhao
Conference NameIJCNN2018
Conference Date8-13, July, 2018
Conference PlaceRio de Janeiro, Brazil
AbstractWith the improvement of speech recognition, voice products are gradually applied to every scene of life. The existing approaches to handle various scenarios are often to build many different acoustic models using scenario-dependent data only, with each for a special scene. The obvious weakness of these approaches is that it seriously hampers the large-scale application and maintenance of voice products. To address this issue, acoustic modeling based on context-independent syllables optimized with CTC loss is presented for multiple scenarios of Mandarin speech recognition. On the one hand, context-independent modeling overcomes the shortcomings of context-dependent modeling over-fitting a particular scene. Also, it sidesteps decision trees used in context-dependent modeling so that there is no need to consider the building of decision tree and whether to start training again in a real application. On the other hand, choosing longer-length syllable acoustic units can effectively preserve the co-articulation effect that context-dependent phone can model. Also, syllables in the Chinese language have its inherent advantages, as its number is fixed and it is trainable, effective generalization and better robustness. This paper also explores the differences between wideband and narrowband data caused by the front-end signal acquisition block, and proposes a unified training method based on the use of VGG in the bottom layer, and introduces layer normalization. The experimental results demonstrate that the proposed syllable-based CTC acoustic model for multiple scenarios can achieve more than 15\% and 7\% relatively improvement for mobile phone data and telephone data separately compare with scenarios-dependent modeling.
KeywordMulti-scenarios Context-independent Syllable-based Modeling Mandarin Speech Recognition Layer Normalization
Indexed ByEI
Document Type会议论文
Corresponding AuthorYuanyuan Zhao
Affiliation1.Institute of Automation, Chinese Academy of Sciences
2.University of Chinese Academy of Sciences
Recommended Citation
GB/T 7714
Zhao YY,Linhao Dong,Shuang Xu,et al. Syllable-Based Acoustic Modeling with CTC for Multi-Scenarios Mandarin speech recognition[C],2018.
Files in This Item: Download All
File Name/Size DocType Version Access License
Syllable-Based Acous(98KB)会议论文 开放获取CC BY-NC-SAView Download
Related Services
Recommend this item
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Zhao YY(赵媛媛)]'s Articles
[Linhao Dong]'s Articles
[Shuang Xu]'s Articles
Baidu academic
Similar articles in Baidu academic
[Zhao YY(赵媛媛)]'s Articles
[Linhao Dong]'s Articles
[Shuang Xu]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Zhao YY(赵媛媛)]'s Articles
[Linhao Dong]'s Articles
[Shuang Xu]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: Syllable-Based Acoustic Modeling for Multi-Scenarios speech recognition.pdf
Format: Adobe PDF
All comments (0)
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.