CASIA OpenIR  > 复杂系统认知与决策实验室
Multi-Cue Guided Semi-Supervised Learning Toward Target Speaker Separation in Real Environments
Xu, Jiaming1; Cui, Jian2,3; Hao, Yunzhe2,3; Xu, Bo2,3,4
发表期刊IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING
ISSN2329-9290
2024
卷号32页码:151-163
通讯作者Xu, Bo(xubo@ia.ac.cn)
摘要To solve the cocktail party problem in real multi-talker environments, this article proposed a multi-cue guided semi-supervised target speaker separation method (MuSS). Our MuSS integrates three target speaker-related cues, including spatial, visual, and voiceprint cues. Under the guidance of the cues, the target speaker is separated into a predefined output channel, and the interfering sources are separated into other output channels with the optimal permutation. Both synthetic mixtures and real mixtures are utilized for semi-supervised training. Specifically, for synthetic mixtures, the separated target source and other separated interfering sources are trained to reconstruct the ground-truth references, while for real mixtures, the mixture of two real mixtures is fed into our separation model, and the separated sources are remixed to reconstruct the two real mixtures. Besides, in order to facilitate finetuning and evaluating the estimated source on real mixtures, we introduce a real multi-modal speech separation dataset, RealMuSS, which is collected in real-world scenarios and is comprised of more than one hundred hours of multi-talker mixtures with high-quality pseudo references of the target speakers. Experimental results show that the pseudo references effectively improve the finetuning efficiency and enable the model to successfully learn and evaluate estimating speech on real mixtures, and various cue-driven separation models are greatly improved in signal-to-noise ratio and speech recognition accuracy under our semi-supervised learning framework.
关键词Cocktail party problem target speaker separation multi-cue guided separation semi-supervised learning
DOI10.1109/TASLP.2023.3323856
关键词[WOS]SPEECH RECOGNITION ; EXTRACTION
收录类别SCI
语种英语
资助项目National Key Research and Development Program of China[2021ZD0201500] ; Strategic Priority Research Program of the Chinese Academy of Sciences[XDB32070000]
项目资助者National Key Research and Development Program of China ; Strategic Priority Research Program of the Chinese Academy of Sciences
WOS研究方向Acoustics ; Engineering
WOS类目Acoustics ; Engineering, Electrical & Electronic
WOS记录号WOS:001097062800011
出版者IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
引用统计
文献类型期刊论文
条目标识符http://ir.ia.ac.cn/handle/173211/55153
专题复杂系统认知与决策实验室
通讯作者Xu, Bo
作者单位1.Xiaomi Corp, Beijing 100085, Peoples R China
2.Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China
3.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 101408, Peoples R China
4.Chinese Acad Sci, Ctr Excellence Brain Sci & Intelligence Technol, Shanghai 200031, Peoples R China
通讯作者单位中国科学院自动化研究所
推荐引用方式
GB/T 7714
Xu, Jiaming,Cui, Jian,Hao, Yunzhe,et al. Multi-Cue Guided Semi-Supervised Learning Toward Target Speaker Separation in Real Environments[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING,2024,32:151-163.
APA Xu, Jiaming,Cui, Jian,Hao, Yunzhe,&Xu, Bo.(2024).Multi-Cue Guided Semi-Supervised Learning Toward Target Speaker Separation in Real Environments.IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING,32,151-163.
MLA Xu, Jiaming,et al."Multi-Cue Guided Semi-Supervised Learning Toward Target Speaker Separation in Real Environments".IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 32(2024):151-163.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Xu, Jiaming]的文章
[Cui, Jian]的文章
[Hao, Yunzhe]的文章
百度学术
百度学术中相似的文章
[Xu, Jiaming]的文章
[Cui, Jian]的文章
[Hao, Yunzhe]的文章
必应学术
必应学术中相似的文章
[Xu, Jiaming]的文章
[Cui, Jian]的文章
[Hao, Yunzhe]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。