Train from scratch: Single-stage joint training of speech separation and recognition
Shi, Jing1; Chang, Xuankai2; Watanabe, Shinji2; Xu, Bo1,3
发表期刊COMPUTER SPEECH AND LANGUAGE
ISSN0885-2308
2022-11-01
卷号76页码:15
通讯作者Watanabe, Shinji(shinjiw@ieee.org)
摘要Multi-speaker speech separation and recognition gains much attention in the speech community recently. Previously, most studies train the front-end separation module and back-end recognition module individually. The two modules after training are combined together either with a hybrid structure or by fine-tuning the resulting model. In this work, we present a unified and flexible multi-speaker end-to-end ASR model. In contrast to previous studies, our proposed model is trained from scratch with a complete single stage, rather than multiple training stages based on pre-training and the following fine-tuning. Our model can deal with either single channel or multi-channel speech input. Moreover, the proposed model can be trained with or without the clean source speech references. We evaluate the proposed model on the WSJ02mix dataset in both single-channel and spatialized multi-channel conditions. The experiments demonstrate that the proposed methods can improve the performance of the end-to-end model in recognizing the separated streams without much degradation in speech separation, achieving a new state-of-the-art in the WSJ0-2mix dataset. Moreover, we systematically assess the impact of various features for the success of the joint-training model and will release all our codes, which may provide a new guidance for the integration of front-end and back-end towards complex auditory scenes.
关键词Cocktail party problem Speech separation Multi-speaker speech recognition End-to-end Joint-training
DOI10.1016/j.csl.2022.101387
关键词[WOS]DOMAIN AUDIO SEPARATION ; NEURAL-NETWORKS ; END
收录类别SCI
语种英语
WOS研究方向Computer Science
WOS类目Computer Science, Artificial Intelligence
WOS记录号WOS:000798734700002
出版者ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD
引用统计
被引频次:3[WOS]   [WOS记录]     [WOS相关记录]
文献类型期刊论文
条目标识符http://ir.ia.ac.cn/handle/173211/49507
专题复杂系统认知与决策实验室_听觉模型与认知计算
通讯作者Watanabe, Shinji
作者单位1.Chinese Acad Sci CASIA, Inst Automat, Beijing, Peoples R China
2.Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
3.Univ Chinese Acad Sci, Beijing, Peoples R China
第一作者单位中国科学院自动化研究所
推荐引用方式
GB/T 7714
Shi, Jing,Chang, Xuankai,Watanabe, Shinji,et al. Train from scratch: Single-stage joint training of speech separation and recognition[J]. COMPUTER SPEECH AND LANGUAGE,2022,76:15.
APA Shi, Jing,Chang, Xuankai,Watanabe, Shinji,&Xu, Bo.(2022).Train from scratch: Single-stage joint training of speech separation and recognition.COMPUTER SPEECH AND LANGUAGE,76,15.
MLA Shi, Jing,et al."Train from scratch: Single-stage joint training of speech separation and recognition".COMPUTER SPEECH AND LANGUAGE 76(2022):15.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Shi, Jing]的文章
[Chang, Xuankai]的文章
[Watanabe, Shinji]的文章
百度学术
百度学术中相似的文章
[Shi, Jing]的文章
[Chang, Xuankai]的文章
[Watanabe, Shinji]的文章
必应学术
必应学术中相似的文章
[Shi, Jing]的文章
[Chang, Xuankai]的文章
[Watanabe, Shinji]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。