Multidimensional Residual Learning Based on Recurrent Neural Networks for Acoustic Modeling

	Multidimensional Residual Learning Based on Recurrent Neural Networks for Acoustic Modeling
	Zhao, Yuanyuan; Xu, Shuang; Xu, Bo; Yuanyuan Zhao
	2016-09
会议名称	Interspeech2016
页码	3419-3423
会议日期	September 8-12
会议地点	San Francisco, USA
摘要	Theoretical and empirical evidences indicate that the depth of neural networks is crucial to acoustic modeling in speech recognition tasks. Unfortunately, the situation in practice always is that with the depth increasing, the accuracy gets saturated and then degrades rapidly. In this paper, a novel multidimensional residual learning architecture is proposed to address this degradation of deep recurrent neural networks (RNNs) on acoustic modeling by further exploring the spatial and temporal dimensions. In the spatial dimension, shortcut connections are introduced to RNNs, along which the information can flow across several layers without attenuation. In the temporal dimension, we cope with the degradation problem by regulating temporal granularity, namely, splitting the input sequence into several parallel sub-sequences, which can ensure information flowing across the time axis unimpededly. Finally, we place a row convolution layer on the top of all recurrent layers to comprehend appropriate information from several parallel sub-sequences to feed to the classifier. Experiments are illustrated on two quite different speech recognition tasks and 10% relative performance improvements are observed.
关键词	Acoustic Modeling Multidimensional Residual Learning Long Short-term Memory Block Row Convolution Layer
收录类别	EI
语种	英语
文献类型	会议论文
条目标识符	http://ir.ia.ac.cn/handle/173211/41093
专题	复杂系统认知与决策实验室_听觉模型与认知计算
通讯作者	Yuanyuan Zhao
推荐引用方式 GB/T 7714	Zhao, Yuanyuan,Xu, Shuang,Xu, Bo,et al. Multidimensional Residual Learning Based on Recurrent Neural Networks for Acoustic Modeling[C],2016:3419-3423.