Self-Attention Transducers for End-to-End Speech Recognition

	Self-Attention Transducers for End-to-End Speech Recognition
	Zhengkun Tian1,2 ; Jiangyan Yi1 ; Jianhua Tao1,2,3 ; Ye Bai1,2 ; Zhengqi Wen1
	2019-09
会议名称	INTERSPEECH
会议日期	September 15–19, 2019
会议地点	Graz, Austria
摘要	Recurrent neural network transducers (RNN-T) have been successfully applied in end-to-end speech recognition. However, the recurrent structure makes it difficult for parallelization. In this paper, we propose a self-attention transducer (SA-T) for speech recognition. RNNs are replaced with self-attention blocks, which are powerful to model long-term dependencies inside sequences and able to be efficiently parallelized. Furthermore, a path-aware regularization is proposed to assist SA-T to learn alignments and improve the performance. Additionally, a chunk-flow mechanism is utilized to achieve online decoding. All experiments are conducted on a Mandarin Chinese dataset AISHELL-1. The results demonstrate that our proposed approach achieves a 21.3% relative reduction in character error rate compared with the baseline RNN-T. In addition, the SA-T with chunk-flow mechanism can perform online decoding with only a little degradation of the performance.
收录类别	EI
语种	英语
文献类型	会议论文
条目标识符	http://ir.ia.ac.cn/handle/173211/48608
专题	多模态人工智能系统全国重点实验室_智能交互
作者单位	1.National Laboratory of Pattern Recognition, Institute of Automation, CASIA 2.School of Artificial Intelligence, University of Chinese Academy of Sciences 3.CAS Center for Excellence in Brain Science and Intelligence Technology
第一作者单位	模式识别国家重点实验室
推荐引用方式 GB/T 7714	Zhengkun Tian,Jiangyan Yi,Jianhua Tao,et al. Self-Attention Transducers for End-to-End Speech Recognition[C],2019.

条目包含的文件		下载所有文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
tian19b_interspeech.（278KB）	会议论文		开放获取	CC BY-NC-SA	浏览下载