CASIA OpenIR  > 模式识别国家重点实验室  > 语音交互
Deep Learning Based Speech Separation via NMF-Style Reconstructions
Nie, Shuai1; Liang, Shan1; Liu, Wenju1; Zhang, Xueliang2; Tao, Jianhua1
2018-11-01
发表期刊IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING
卷号26期号:11页码:2043-2055
文章类型Article
摘要Deep learning based speech separation usually uses a supervised algorithm to learn a mapping function from noisy features to separation targets. These separation targets, either ideal masks or magnitude spectrograms, have prominent spectro-temporal structures. Nonnegative matrix factorization (NMF) is a well-known representation learning technique that is capable of capturing the basic spectral structures. Therefore, the combination of deep learning and NMF as an organic whole is a smart strategy. However, previous methods typically use deep neural networks (DNN) and NMF for speech separation in a separate manner. In this paper, we propose a jointly combinatorial scheme to concentrate the strengths of both DNN and NMF for speech separation. NMF is used to learn the basis spectra that then are integrated into a DNN to directly reconstruct the magnitude spectrograms of speech and noise. Instead of predicting activation coefficients inferred by NMF, which is used as an intermediate target by the previous methods, DNN directly optimizes an actual separation objective in our system, so that the accumulated errors could be alleviated. Moreover, we explore a discriminative training objective with sparsity constraints to suppress noise and preserve more speech components further. Systematic experiments show that the proposed models are competitive with the previous methods.
关键词Speech Separation Deep Neural Network (Dnn) Nonnegative Matrix Factorization (Nmf) Spectro-temporal Structures
WOS标题词Science & Technology ; Technology
DOI10.1109/TASLP.2018.2851151
关键词[WOS]NONNEGATIVE MATRIX FACTORIZATION ; AUDIO SOURCE SEPARATION ; TO-NOISE RATIO ; NEURAL-NETWORKS ; INTELLIGIBILITY ; QUALITY ; MASK ; RECOGNITION ; ALGORITHMS
收录类别SCI
语种英语
项目资助者China National Nature Science Foundation(61573357 ; National Science Fund for Distinguished Young Scholars(61425017) ; 61503382 ; 61403370 ; 61273267 ; 91120303)
WOS研究方向Acoustics ; Engineering
WOS类目Acoustics ; Engineering, Electrical & Electronic
WOS记录号WOS:000441430600008
引用统计
文献类型期刊论文
条目标识符http://ir.ia.ac.cn/handle/173211/21839
专题模式识别国家重点实验室_语音交互
作者单位1.Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
2.Inner Mongolia Univ, Coll Comp Sci, Hohhot 010021, Peoples R China
推荐引用方式
GB/T 7714
Nie, Shuai,Liang, Shan,Liu, Wenju,et al. Deep Learning Based Speech Separation via NMF-Style Reconstructions[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING,2018,26(11):2043-2055.
APA Nie, Shuai,Liang, Shan,Liu, Wenju,Zhang, Xueliang,&Tao, Jianhua.(2018).Deep Learning Based Speech Separation via NMF-Style Reconstructions.IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING,26(11),2043-2055.
MLA Nie, Shuai,et al."Deep Learning Based Speech Separation via NMF-Style Reconstructions".IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 26.11(2018):2043-2055.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Nie, Shuai]的文章
[Liang, Shan]的文章
[Liu, Wenju]的文章
百度学术
百度学术中相似的文章
[Nie, Shuai]的文章
[Liang, Shan]的文章
[Liu, Wenju]的文章
必应学术
必应学术中相似的文章
[Nie, Shuai]的文章
[Liang, Shan]的文章
[Liu, Wenju]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。