SSCFormer: Push the Limit of Chunk-Wise Conformer for Streaming ASR Using Sequentially Sampled Chunks and Chunked Causal Convolution
Wang, Fangyuan; Xu, Bo; Xu, Bo
发表期刊IEEE SIGNAL PROCESSING LETTERS
ISSN1070-9908
2024
卷号31页码:421-425
通讯作者Xu, Bo(boxu@ia.ac.cn)
摘要Currently, the chunk-wise schemes are often used to make Automatic Speech Recognition (ASR) models to support streaming deployment. However, existing approaches are unable to capture the global context, lack support for parallel training, or exhibit quadratic complexity for the computation of multi-head self-attention (MHSA). On the other side, the causal convolution, no future context used, has become the de facto module in streaming Conformer. In this letter, we propose SSCFormer to push the limit of chunk-wise Conformer for streaming ASR using the following two techniques: 1) A novel cross-chunks context generation method, named Sequential Sampling Chunk (SSC) scheme, to re-partition chunks from regular partitioned chunks to facilitate efficient long-term contextual interaction within local chunks. 2)The Chunked Causal Convolution (C2Conv) is designed to concurrently capture the left context and chunk-wise future context. Evaluations on AISHELL-1 show that an End-to-End (E2E) CER 5.33% can achieve, which even outperforms a strong time-restricted baseline U2. Moreover, the chunk-wise MHSA computation in our model enables it to train with a large batch size and perform inference with linear complexity.
关键词Convolution Complexity theory Computational modeling Decoding Training Kernel Transformers Conformer streaming ASR sequentially sampled chunks chunked causal convolution linear complexity
DOI10.1109/LSP.2024.3352489
收录类别SCI
语种英语
资助项目Strategic Priority Research Program of the Chinese Academy of Sciences
项目资助者Strategic Priority Research Program of the Chinese Academy of Sciences
WOS研究方向Engineering
WOS类目Engineering, Electrical & Electronic
WOS记录号WOS:001166718500005
出版者IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
引用统计
文献类型期刊论文
条目标识符http://ir.ia.ac.cn/handle/173211/57781
专题复杂系统认知与决策实验室_听觉模型与认知计算
通讯作者Xu, Bo
作者单位Chinese Acad Sci, Inst Automat, Beijing 10090, Peoples R China
第一作者单位中国科学院自动化研究所
通讯作者单位中国科学院自动化研究所
推荐引用方式
GB/T 7714
Wang, Fangyuan,Xu, Bo,Xu, Bo. SSCFormer: Push the Limit of Chunk-Wise Conformer for Streaming ASR Using Sequentially Sampled Chunks and Chunked Causal Convolution[J]. IEEE SIGNAL PROCESSING LETTERS,2024,31:421-425.
APA Wang, Fangyuan,Xu, Bo,&Xu, Bo.(2024).SSCFormer: Push the Limit of Chunk-Wise Conformer for Streaming ASR Using Sequentially Sampled Chunks and Chunked Causal Convolution.IEEE SIGNAL PROCESSING LETTERS,31,421-425.
MLA Wang, Fangyuan,et al."SSCFormer: Push the Limit of Chunk-Wise Conformer for Streaming ASR Using Sequentially Sampled Chunks and Chunked Causal Convolution".IEEE SIGNAL PROCESSING LETTERS 31(2024):421-425.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Wang, Fangyuan]的文章
[Xu, Bo]的文章
[Xu, Bo]的文章
百度学术
百度学术中相似的文章
[Wang, Fangyuan]的文章
[Xu, Bo]的文章
[Xu, Bo]的文章
必应学术
必应学术中相似的文章
[Wang, Fangyuan]的文章
[Xu, Bo]的文章
[Xu, Bo]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。