Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition

	Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition
	Ye Bai; Jiangyan Yi; Jianhua Tao; Zhengkun Tian; Zhengqi Wen; Shuai Zhang
	2020
会议名称	interspeecch
会议日期	2020
会议地点	shanghai
摘要	Although attention based end-to-end models have achieved promising performance in speech recognition, the multi-pass forward computation in beam-search increases inference time cost, whichlimitstheirpracticalapplications. Toaddressthisis- sue, we propose a non-autoregressive end-to-end speech recog- nition system called LASO (listen attentively, and spell once). Because of the non-autoregressive property, LASO predicts a textual token in the sequence without the dependence on other tokens. Without beam-search, the one-pass propagation much reduces inference time cost of LASO. And because the model is based on the attention based feedforward structure, the com- putation can be implemented in parallel efficiently. We conduct experiments on publicly available Chinese dataset AISHELL- 1. LASO achieves a character error rate of 6.4%, which out- performs the state-of-the-art autoregressive transformer model (6.7%). The average inference latency is 21 ms, which is 1/50 of the autoregressive transformer model.
七大方向——子方向分类	语音识别与合成
文献类型	会议论文
条目标识符	http://ir.ia.ac.cn/handle/173211/44978
专题	多模态人工智能系统全国重点实验室_智能交互
作者单位	Institute of Automation, Chinese Academy of Sciences
第一作者单位	中国科学院自动化研究所
推荐引用方式 GB/T 7714	Ye Bai,Jiangyan Yi,Jianhua Tao,et al. Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition[C],2020.

条目包含的文件		下载所有文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
LASO-camera.pdf（801KB）	会议论文		开放获取	CC BY-NC-SA	浏览下载