CASIA OpenIR  > 模式识别国家重点实验室  > 语音交互
Deep Learning Based Speech Separation via NMF-Style Reconstructions
Nie, Shuai1; Liang, Shan1; Liu, Wenju1; Zhang, Xueliang2; Tao, Jianhua1
Source PublicationIEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING
2018-11-01
Volume26Issue:11Pages:2043-2055
SubtypeArticle
AbstractDeep learning based speech separation usually uses a supervised algorithm to learn a mapping function from noisy features to separation targets. These separation targets, either ideal masks or magnitude spectrograms, have prominent spectro-temporal structures. Nonnegative matrix factorization (NMF) is a well-known representation learning technique that is capable of capturing the basic spectral structures. Therefore, the combination of deep learning and NMF as an organic whole is a smart strategy. However, previous methods typically use deep neural networks (DNN) and NMF for speech separation in a separate manner. In this paper, we propose a jointly combinatorial scheme to concentrate the strengths of both DNN and NMF for speech separation. NMF is used to learn the basis spectra that then are integrated into a DNN to directly reconstruct the magnitude spectrograms of speech and noise. Instead of predicting activation coefficients inferred by NMF, which is used as an intermediate target by the previous methods, DNN directly optimizes an actual separation objective in our system, so that the accumulated errors could be alleviated. Moreover, we explore a discriminative training objective with sparsity constraints to suppress noise and preserve more speech components further. Systematic experiments show that the proposed models are competitive with the previous methods.
KeywordSpeech Separation Deep Neural Network (Dnn) Nonnegative Matrix Factorization (Nmf) Spectro-temporal Structures
WOS HeadingsScience & Technology ; Technology
DOI10.1109/TASLP.2018.2851151
WOS KeywordNONNEGATIVE MATRIX FACTORIZATION ; AUDIO SOURCE SEPARATION ; TO-NOISE RATIO ; NEURAL-NETWORKS ; INTELLIGIBILITY ; QUALITY ; MASK ; RECOGNITION ; ALGORITHMS
Indexed BySCI
Language英语
Funding OrganizationChina National Nature Science Foundation(61573357 ; National Science Fund for Distinguished Young Scholars(61425017) ; 61503382 ; 61403370 ; 61273267 ; 91120303)
WOS Research AreaAcoustics ; Engineering
WOS SubjectAcoustics ; Engineering, Electrical & Electronic
WOS IDWOS:000441430600008
Citation statistics
Document Type期刊论文
Identifierhttp://ir.ia.ac.cn/handle/173211/21839
Collection模式识别国家重点实验室_语音交互
Affiliation1.Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
2.Inner Mongolia Univ, Coll Comp Sci, Hohhot 010021, Peoples R China
Recommended Citation
GB/T 7714
Nie, Shuai,Liang, Shan,Liu, Wenju,et al. Deep Learning Based Speech Separation via NMF-Style Reconstructions[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING,2018,26(11):2043-2055.
APA Nie, Shuai,Liang, Shan,Liu, Wenju,Zhang, Xueliang,&Tao, Jianhua.(2018).Deep Learning Based Speech Separation via NMF-Style Reconstructions.IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING,26(11),2043-2055.
MLA Nie, Shuai,et al."Deep Learning Based Speech Separation via NMF-Style Reconstructions".IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 26.11(2018):2043-2055.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Nie, Shuai]'s Articles
[Liang, Shan]'s Articles
[Liu, Wenju]'s Articles
Baidu academic
Similar articles in Baidu academic
[Nie, Shuai]'s Articles
[Liang, Shan]'s Articles
[Liu, Wenju]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Nie, Shuai]'s Articles
[Liang, Shan]'s Articles
[Liu, Wenju]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.