CASIA OpenIR  > 毕业生  > 博士学位论文
基于听觉谱局域关联建模的语音分离方法研究
Alternative TitleSpeech separation based on local correlation model of cochleagram
梁山
Subtype工学博士
Thesis Advisor刘文举
2013-06-01
Degree Grantor中国科学院大学
Place of Conferral中国科学院自动化研究所
Degree Discipline模式识别与智能系统
Keyword语音分离 计算听觉场景分析 邻域关联模型 贝叶斯估计 理想二值掩蔽 理想浮值掩蔽 Speech Separation Computational Auditory Scene Analysis Local Correlation Model Bayes Estimation Ideal Binary Mask Ideal Ratio Mask
Abstract从背景噪音中分离出目标语音信号是语音信号处理领域的一个重要问题。计算听觉场景分析是解决该问题的可行方案之一。在大部分语音分离系统中, 由于语音信号的非平稳性,时域信号首先被变换到二维时频表达。语音分离问题进一步可以转化为理想二值或者浮值掩蔽估计问题。 近几年来,统计分类模型开始广泛应用于理想二值掩蔽估计。然而,相邻时频单元之间的关联信息还没有得到足够的重视。本文对该关联信息展开深入 研究,并和原分离系统相融合以提高分离性能。主要工作及创新点如下: 对理想二值掩蔽(Ideal Binary Mask, IBM)和理想浮值掩蔽(Ideal Ratio Mask, IRM)在信噪比意义下的近似性讨论。IBM和IRM是语音分离最常用的两个计算目标。由于计算目标的确立是设计一个复杂计算系统的关键问题,我们首先在均方误差及信噪比意义下分析和对比这两种掩蔽模型。根据帕斯瓦尔等式,我们推导出IBM和IRM的均方误差在能谱域的表达方式。在近似联合正交假设下,分析了IRM的近似均方误差。进一步,分析了这两种掩蔽策略在均方误差上的差异,并推导出这二者在信噪比增益指标上差异的近似上界, 即3.01dB。实际分离任务中,该差异比该上界更小,一般小于1dB。 基于融合局域关联模型贝叶斯方法的 IBM估计。主要提出了一个基于时频分割的IBM自适应先验分布模型和基于局部噪音追踪的噪音先验分布模型。这两个模型和原贝叶斯分类器相融合。由于考虑了局部关联信息,使得后验分布函数是一个高维函数。最后,我们采用马尔科夫链-蒙特卡罗算法逼近IBM的期望。实验表明,该关联模型可以提高IBM估计的准确率和分离语音的信噪比。更进一步,有效抑制了原IBM估计中的离散点,使得IBM估计更加平滑。 基于马尔科夫条件随机场语音听觉谱模型的IRM估计。由于IBM采用非是即否的掩蔽策略,IBM估计错误会直接导致听觉谱上相邻时频单元间语音能量的巨变。这和语音听觉谱连续性和慢变性相矛盾。我们提出了一个基于马尔科夫条件随机场的听觉谱先验模型。基于该先验模型,我们采用ICM(Iterated Conditional Modes)算法对二值掩蔽估计对应的听觉谱进行平滑。最后,我们采用浮值掩蔽策略恢复语音信号。实验表明,该平滑算法可以提高分离性能,尤其是更有效地抑制Artefacts噪声。 提出了语音分离模型在信噪比意义下的最优浮值掩蔽(ORM)。最大化信噪比等价于最小化均方误差。同样,基于帕斯瓦尔等式,最小化均方误差转化为一个凸优化问题。在期望的意义下,ORM相对于IRM可以将均方误差降低一半,信噪比增益提高3.01dB。语音质量评估实验表明,ORM还可以显著提高感知质量。该项工作可以看作对计算目标分析的延伸。分析结果还表明,可以采用类似于IRM估计的方法估计ORM,即先估计二值掩蔽然后基于听觉谱局域关联模型泛化到浮值掩蔽。ORM估计是我们下一步工作的重点之一。
Other AbstractSeparating target speech signal from background noises is one of the key problems in speech processing. Computational auditory scene analysis is one promising approach to this problem. In most of the present speech separation systems, the time domain signal is firstly decomposed into time-frequency (T-F) domain due to the non-stationary property of speech. The separation problem can be transformed into the ideal binary or ratio mask estimation task. In recent years, statistical classification models have been widely used for binary mask estimation. However, the correlation information between adjacent time-frequency units has not received much attention. In this thesis, we studied the local correlation information and integrating it with the original systems to improve the performance. The main works and innovations include: Discussing the approximate property of the ideal binary mask(IBM) to the ideal ratio mask (IRM) in the signal-to-noise ratio(SNR) sense. The IBM and IRM are two commonly used computational goals in speech separation. Since the computational goal is one of the key problems in designing a complex system, we firstly analyze and compare the two mask models in the mean square error and SNR senses. According to Paseval's equality, the presentations of the mean square error(MSE) in the T-F domain corresponding to the IBM and the IRM are derived. Under approximate W-Disjoint Orthogonality assumption, we analyze the MSE of the IRM. Then,the difference between the two masks is analyzed. We further find that the upper bound of the difference in signal-to-noise ratio (SNR) is approximately equal to 3.01dB. In practice separation, the difference is usually smaller than 1dB. Integrating the local correlation model with Bayes classification for the IBM estimation. We propose a time-frequency segmentation based adaptive prior model of the IBM and a local noise tracking based prior model of noise energy. Then, the two prior models are integrated with the original Bayes classification. Due to the local correlation information, the posterior distribution is a high-dimension function. Finally,we use Markov Chain Mento Carlo algorithm to approach the expectation of the IBM. Experiments show that the local correlation model could improve the accuracy of the IBM estimation the SNR gain of the separated speech. Further more, since some outliers in the IBM estimation is suppressed, a more smooth estimation is obtained. Markov random field based sp...
shelfnumXWLW1871
Other Identifier201018014628046
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/6556
Collection毕业生_博士学位论文
Recommended Citation
GB/T 7714
梁山. 基于听觉谱局域关联建模的语音分离方法研究[D]. 中国科学院自动化研究所. 中国科学院大学,2013.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[梁山]'s Articles
Baidu academic
Similar articles in Baidu academic
[梁山]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[梁山]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.