Syllable-Based Acoustic Modeling with CTC for Multi-Scenarios Mandarin speech recognition
Zhao YY(赵媛媛); Linhao Dong; Shuang Xu; Bo Xu; Yuanyuan Zhao
2018
会议名称IJCNN2018
会议日期8-13, July, 2018
会议地点Rio de Janeiro, Brazil
摘要With the improvement of speech recognition, voice products are gradually applied to every scene of life. The existing approaches to handle various scenarios are often to build many different acoustic models using scenario-dependent data only, with each for a special scene. The obvious weakness of these approaches is that it seriously hampers the large-scale application and maintenance of voice products. To address this issue, acoustic modeling based on context-independent syllables optimized with CTC loss is presented for multiple scenarios of Mandarin speech recognition. On the one hand, context-independent modeling overcomes the shortcomings of context-dependent modeling over-fitting a particular scene. Also, it sidesteps decision trees used in context-dependent modeling so that there is no need to consider the building of decision tree and whether to start training again in a real application. On the other hand, choosing longer-length syllable acoustic units can effectively preserve the co-articulation effect that context-dependent phone can model. Also, syllables in the Chinese language have its inherent advantages, as its number is fixed and it is trainable, effective generalization and better robustness. This paper also explores the differences between wideband and narrowband data caused by the front-end signal acquisition block, and proposes a unified training method based on the use of VGG in the bottom layer, and introduces layer normalization. The experimental results demonstrate that the proposed syllable-based CTC acoustic model for multiple scenarios can achieve more than 15\% and 7\% relatively improvement for mobile phone data and telephone data separately compare with scenarios-dependent modeling.
关键词Multi-scenarios Context-independent Syllable-based Modeling Mandarin Speech Recognition Layer Normalization
收录类别EI
语种英语
文献类型会议论文
条目标识符http://ir.ia.ac.cn/handle/173211/40998
专题复杂系统认知与决策实验室_听觉模型与认知计算
通讯作者Yuanyuan Zhao
推荐引用方式
GB/T 7714
Zhao YY,Linhao Dong,Shuang Xu,et al. Syllable-Based Acoustic Modeling with CTC for Multi-Scenarios Mandarin speech recognition[C],2018.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Zhao YY(赵媛媛)]的文章
[Linhao Dong]的文章
[Shuang Xu]的文章
百度学术
百度学术中相似的文章
[Zhao YY(赵媛媛)]的文章
[Linhao Dong]的文章
[Shuang Xu]的文章
必应学术
必应学术中相似的文章
[Zhao YY(赵媛媛)]的文章
[Linhao Dong]的文章
[Shuang Xu]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。