声学线索挖掘与深度学习有效融合的语音分离方法

CASIA OpenIR > 毕业生 > 博士学位论文

	声学线索挖掘与深度学习有效融合的语音分离方法
其他题名	Speech Separation Base on Effective Fusion of Acoustic Cues with Deep Learning
	江巍
	2015-05-31
学位类型	工学博士
中文摘要	本文以单通道情形下的计算听觉场景分析（CASA）算法为研究对象，主要致力于解决在各种噪声环境下的语音去噪与分离问题。随着近年来以深度神经网络为主要代表的深度机器学习理论和技术的快速发展，CASA也从以机理建模为主要特点的算法逐渐过渡到了目前以机器学习为主要特点的阶段。然而，如何在选用和设计恰当的深度网络结构，并对多角度、多侧面的各种声学线索进行有效利用仍然需要深入地研究。本文的主要工作及创新点如下： · 合作式深度堆叠神经网络在语音分离中的应用。在设计基于深度学习的语音分离算法的过程中，我们发现深度堆叠神经网络非常适合语音分离任务，并且这种网络便于并行算法的实现。在此基础上，受多任务学习策略的启发，我们提出了一种合作式的深度堆叠神经网络框架，这一框架便于实现多种声学线索的有机融合。 · 基于深度学习的跨表达域合作和多分辨率合作的语音分离方法。基于我们提出的深度学习语音分离框架，我们首先使同一语音信号在不同时频表达域的特征进行融合，发现不同的时频表达方法可以反映同一语音信号的不同侧面，并提供互补信息，从而使得语音分离算法的性能得以提升；进一步，即使在同一时频表达域，如果我们使不同分辨率尺度下的特征进行融合，也能提高语音分离算法的性能。 · 基于深度学习网络的双说话人语音分离系统。现有的双说话人分离系统有的是围绕双基频信息和语音局部关联性信息进行语音分离；也有的是采用产生式的统计模型进行语音分离，如在GMM说话人模型基础上的MAXVQ方法、Linear VQ方法，以及基于码本生成的非负矩阵分解（NMF）、稀疏编码（Sparse Coding）等。但是，这些算法的共同特点是试图通过简单的近似计算或者线性叠加来对混合语音的产生过程进行建模。本文对直接基于深度神经网络（DNN）这种判别式模型的双说话人语音分离算法进行了相关探索。然后进一步，我们把本文提出的合作式深度堆叠神经网络也推广到了双说话人语音分离系统中。实验结果验证了基于DNN的双说话人语音分离算法的有效性，并且表明基于合作式DSN的算法具有更好的性能。 · 基于经验模式分解的抗混响双基频提取算法。基音频率是计算听觉场景分析（CASA）算法中占有重要地位的一个声学线索，同时多基音提取本身也是一个比较困难的问题。针对传统的基于加和自相关函数进行基频提取的算法所依赖的峰值位置和峰值高度在多基频提取过程中的不可靠问题，我们通过希尔伯特经验模式分解得到一种被称为频率匹配函数（FMF）的时频单元特征表达。通过使用加和的频率匹配函数进行双基频提取，我们提升了现有双基频提取算法的性能。
英文摘要	This dissertation focus on the problem of separating speech from variousintrusive sounds based on the method of computational auditory scene analysis under co-channel circumstances. With the rapid development of deep learning, CASA systems based on deep machine learning, rather than mechanism modeling,have drawn much attention of researchers in this field for the past few years. Nevertheless, how to select and design proper network structure and to use various acoustic cues effectively in deep learning circumstances needs further exploration. The main works and innovations of this dissertation include: · Cooperative deep stacking network (DSN) for speech denoising. During the design process of speech separation algorithms based on deep learning, we found that DSN is very suitable for this task and it is also ready for parallelization. Based on DSN, we propose a cooperative DSN framework, which is suitable to integrate multiple acoustic cues to do speech separation jointly. · Cross-domain and multiscale cooperative DSN for speech denoising. Base on the proposed cooperative DSN framework, we first carried out the fusion of features from different time-frequency domains. As different representations of the speech mixture can provide complementary information, the performance of speech separation is improved. Further, the separation was carried out at different time-frequency scales, which also improved the performance. · Speaker separation based on deep learning. Existing speaker separation algorithms can be classified into two categories. Some of them rely on pitch information and local correlation model to do speaker separation, and others usually use generative statistical models. For example, MAXVQ and Linear VQ are based on GMM model, while NMF and sparse coding methods are based on codebook generation. The common feature of them is the usage of approximation or linear model. In this dissertation speaker separation was first carried out with a discriminative DNN model, and then the proposed cooperative DSN model was also utilized to complete this task. Quantitative experiments show the effectiveness of our method, and better results were obtained by cooperative DSN. · Multipitch tracking based on empirical mode decomposition (EMD) and HMM. Although pitch is an important acoustic cue in speech separation, it is not easy to simultaneously track multipitch. As the amplitude and peak position of autocorrelation function are sometimes not reliabl...
关键词	语音分离计算听觉场景分析合作式深度堆叠神经网络多基频提取深度神经网络 Speech Separation Casa Cooperative Dsn Multipitch Tracking Dnn
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6740
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	江巍. 声学线索挖掘与深度学习有效融合的语音分离方法[D]. 中国科学院自动化研究所. 中国科学院大学,2015.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20101801462803（1161KB）			暂不开放	CC BY-NC-SA