End-to-End Post-Filter for Speech Separation With Deep Attention Fusion Features
Fan, Cunhang1,2; Tao, Jianhua1,2,3; Liu, Bin1; Yi, Jiangyan1; Wen, Zhengqi1; Liu, Xuefei1
发表期刊IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING
ISSN2329-9290
2020
卷号28期号:28页码:1303-1314
摘要

In this article, we propose an end-to-end post-filter method with deep attention fusion features for monaural speaker-independent speech separation. At first, a time-frequency domain speech separation method is applied as the pre-separation stage. The aim of pre-separation stage is to separate the mixture preliminarily. Although this stage can separate the mixture, it still contains the residual interference. In order to enhance the pre-separated speech and improve the separation performance further, the end-to-end post-filter (E2EPF) with deep attention fusion features is proposed. The E2EPF can make full use of the prior knowledge of the pre-separated speech, which contributes to speech separation. It is a fully convolutional speech separation network and uses the waveform as the input features. Firstly, the 1-D convolutional layer is utilized to extract the deep representation features for the mixture and pre-separated signals in the time domain. Secondly, to pay more attention to the outputs of the pre-separation stage, an attention module is applied to acquire deep attention fusion features, which are extracted by computing the similarity between the mixture and the pre-separated speech. These deep attention fusion features are conducive to reduce the interference and enhance the pre-separated speech. Finally, these features are sent to the post-filter to estimate each target signals. Experimental results on the WSJ0-2mix dataset show that the proposed method outperforms the state-of-the-art speech separation method. Compared with the pre-separation method, our proposed method can acquire 64.1%, 60.2%, 25.6% and 7.5% relative improvements in scale-invariant source-to-noise ratio (SI-SNR), the signal-to-distortion ratio (SDR), the perceptual evaluation of speech quality (PESQ) and the short-time objective intelligibility (STOI) measures, respectively.

关键词Feature extraction Training Interference Speech enhancement Clustering algorithms Spectrogram Speech separation end-to-end post-filter deep attention fusion features deep clustering permutation invariant training
DOI10.1109/TASLP.2020.2982029
关键词[WOS]NETWORK
收录类别SCI
语种英语
资助项目National Key Research and Development Plan of China[2017YFC0820602] ; National Natural Science Foundation of China (NSFC)[61831022] ; National Natural Science Foundation of China (NSFC)[61771472] ; National Natural Science Foundation of China (NSFC)[61901473] ; National Natural Science Foundation of China (NSFC)[61773379] ; Inria-CAS Joint Research Project[173211KYSB20170061] ; Inria-CAS Joint Research Project[173211KYSB20190049]
项目资助者National Key Research and Development Plan of China ; National Natural Science Foundation of China (NSFC) ; Inria-CAS Joint Research Project
WOS研究方向Acoustics ; Engineering
WOS类目Acoustics ; Engineering, Electrical & Electronic
WOS记录号WOS:000536055600001
出版者IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
七大方向——子方向分类语音识别与合成
引用统计
被引频次:26[WOS]   [WOS记录]     [WOS相关记录]
文献类型期刊论文
条目标识符http://ir.ia.ac.cn/handle/173211/39517
专题多模态人工智能系统全国重点实验室_智能交互
通讯作者Tao, Jianhua; Liu, Bin
作者单位1.Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
2.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100190, Peoples R China
3.CAS Ctr Excellence Brain Sci & Intelligence Techn, Beijing 100190, Peoples R China
第一作者单位模式识别国家重点实验室
通讯作者单位模式识别国家重点实验室
推荐引用方式
GB/T 7714
Fan, Cunhang,Tao, Jianhua,Liu, Bin,et al. End-to-End Post-Filter for Speech Separation With Deep Attention Fusion Features[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING,2020,28(28):1303-1314.
APA Fan, Cunhang,Tao, Jianhua,Liu, Bin,Yi, Jiangyan,Wen, Zhengqi,&Liu, Xuefei.(2020).End-to-End Post-Filter for Speech Separation With Deep Attention Fusion Features.IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING,28(28),1303-1314.
MLA Fan, Cunhang,et al."End-to-End Post-Filter for Speech Separation With Deep Attention Fusion Features".IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 28.28(2020):1303-1314.
条目包含的文件 下载所有文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
TASLP-SEPARATION.pdf(1344KB)期刊论文作者接受稿开放获取CC BY-NC-SA浏览 下载
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Fan, Cunhang]的文章
[Tao, Jianhua]的文章
[Liu, Bin]的文章
百度学术
百度学术中相似的文章
[Fan, Cunhang]的文章
[Tao, Jianhua]的文章
[Liu, Bin]的文章
必应学术
必应学术中相似的文章
[Fan, Cunhang]的文章
[Tao, Jianhua]的文章
[Liu, Bin]的文章
相关权益政策
暂无数据
收藏/分享
文件名: TASLP-SEPARATION.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。