CASIA OpenIR  > 模式识别国家重点实验室  > 智能交互
Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition
Zhengkun Tian1,2; Jiangyan Yi1,2; Jianhua Tao1,2,3; Ye Bai1,2; Shuai Zhang1,2; Zhengqi Wen1,2
2020-10
Conference NameINTERSPEECH
Conference DateOctober 25–29, 2020
Conference PlaceShanghai, China
Abstract

Non-autoregressive transformer models have achieved extremely
fast inference speed and comparable performance with autoregressive sequence-to-sequence models in neural machine translation. Most of the non-autoregressive transformers decode the target sequence from a predefined-length mask sequence. If the predefined length is too long, it will cause a lot of redundant calculations. If the predefined length is shorter than the length of the target sequence, it will hurt the performance of the model. To address this problem and improve the inference speed, we propose a spike-triggered non-autoregressive transformer model for end-to-end speech recognition, which introduces a CTC module to predict the length of the target sequence and accelerate the convergence. All the experiments are conducted on a public Chinese mandarin dataset AISHELL-1. The results show that the proposed model can accurately predict the length of the target sequence and achieve a competitive performance with the advanced transformers. What’s more, the model even achieves a real-time factor of 0.0056, which exceeds all
mainstream speech recognition models.

Indexed ByEI
Language英语
Document Type会议论文
Identifierhttp://ir.ia.ac.cn/handle/173211/48607
Collection模式识别国家重点实验室_智能交互
Corresponding AuthorJianhua Tao
Affiliation1.NLPR, Institute of Automation, Chinese Academy of Sciences
2.School of Artificial Intelligence, University of Chinese Academy of Sciences
3.CAS Center for Excellence in Brain Science and Intelligence Technology
First Author AffilicationChinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
Corresponding Author AffilicationChinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
Recommended Citation
GB/T 7714
Zhengkun Tian,Jiangyan Yi,Jianhua Tao,et al. Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition[C],2020.
Files in This Item: Download All
File Name/Size DocType Version Access License
tian20c_interspeech.(629KB)会议论文 开放获取CC BY-NC-SAView Download
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Zhengkun Tian]'s Articles
[Jiangyan Yi]'s Articles
[Jianhua Tao]'s Articles
Baidu academic
Similar articles in Baidu academic
[Zhengkun Tian]'s Articles
[Jiangyan Yi]'s Articles
[Jianhua Tao]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Zhengkun Tian]'s Articles
[Jiangyan Yi]'s Articles
[Jianhua Tao]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: tian20c_interspeech.pdf
Format: Adobe PDF
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.