Knowledge Commons of Institute of Automation,CAS
SSCFormer: Push the Limit of Chunk-Wise Conformer for Streaming ASR Using Sequentially Sampled Chunks and Chunked Causal Convolution | |
Wang, Fangyuan; Xu, Bo; Xu, Bo | |
发表期刊 | IEEE SIGNAL PROCESSING LETTERS |
ISSN | 1070-9908 |
2024 | |
卷号 | 31页码:421-425 |
通讯作者 | Xu, Bo(boxu@ia.ac.cn) |
摘要 | Currently, the chunk-wise schemes are often used to make Automatic Speech Recognition (ASR) models to support streaming deployment. However, existing approaches are unable to capture the global context, lack support for parallel training, or exhibit quadratic complexity for the computation of multi-head self-attention (MHSA). On the other side, the causal convolution, no future context used, has become the de facto module in streaming Conformer. In this letter, we propose SSCFormer to push the limit of chunk-wise Conformer for streaming ASR using the following two techniques: 1) A novel cross-chunks context generation method, named Sequential Sampling Chunk (SSC) scheme, to re-partition chunks from regular partitioned chunks to facilitate efficient long-term contextual interaction within local chunks. 2)The Chunked Causal Convolution (C2Conv) is designed to concurrently capture the left context and chunk-wise future context. Evaluations on AISHELL-1 show that an End-to-End (E2E) CER 5.33% can achieve, which even outperforms a strong time-restricted baseline U2. Moreover, the chunk-wise MHSA computation in our model enables it to train with a large batch size and perform inference with linear complexity. |
关键词 | Convolution Complexity theory Computational modeling Decoding Training Kernel Transformers Conformer streaming ASR sequentially sampled chunks chunked causal convolution linear complexity |
DOI | 10.1109/LSP.2024.3352489 |
收录类别 | SCI |
语种 | 英语 |
资助项目 | Strategic Priority Research Program of the Chinese Academy of Sciences |
项目资助者 | Strategic Priority Research Program of the Chinese Academy of Sciences |
WOS研究方向 | Engineering |
WOS类目 | Engineering, Electrical & Electronic |
WOS记录号 | WOS:001166718500005 |
出版者 | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://ir.ia.ac.cn/handle/173211/57781 |
专题 | 复杂系统认知与决策实验室_听觉模型与认知计算 |
通讯作者 | Xu, Bo |
作者单位 | Chinese Acad Sci, Inst Automat, Beijing 10090, Peoples R China |
第一作者单位 | 中国科学院自动化研究所 |
通讯作者单位 | 中国科学院自动化研究所 |
推荐引用方式 GB/T 7714 | Wang, Fangyuan,Xu, Bo,Xu, Bo. SSCFormer: Push the Limit of Chunk-Wise Conformer for Streaming ASR Using Sequentially Sampled Chunks and Chunked Causal Convolution[J]. IEEE SIGNAL PROCESSING LETTERS,2024,31:421-425. |
APA | Wang, Fangyuan,Xu, Bo,&Xu, Bo.(2024).SSCFormer: Push the Limit of Chunk-Wise Conformer for Streaming ASR Using Sequentially Sampled Chunks and Chunked Causal Convolution.IEEE SIGNAL PROCESSING LETTERS,31,421-425. |
MLA | Wang, Fangyuan,et al."SSCFormer: Push the Limit of Chunk-Wise Conformer for Streaming ASR Using Sequentially Sampled Chunks and Chunked Causal Convolution".IEEE SIGNAL PROCESSING LETTERS 31(2024):421-425. |
条目包含的文件 | 条目无相关文件。 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论