CASIA OpenIR  > 毕业生  > 博士学位论文
基于听觉谱局域关联建模的语音分离方法研究
其他题名Speech separation based on local correlation model of cochleagram
梁山
学位类型工学博士
导师刘文举
2013-06-01
学位授予单位中国科学院大学
学位授予地点中国科学院自动化研究所
学位专业模式识别与智能系统
关键词语音分离 计算听觉场景分析 邻域关联模型 贝叶斯估计 理想二值掩蔽 理想浮值掩蔽 Speech Separation Computational Auditory Scene Analysis Local Correlation Model Bayes Estimation Ideal Binary Mask Ideal Ratio Mask
摘要从背景噪音中分离出目标语音信号是语音信号处理领域的一个重要问题。计算听觉场景分析是解决该问题的可行方案之一。在大部分语音分离系统中, 由于语音信号的非平稳性,时域信号首先被变换到二维时频表达。语音分离问题进一步可以转化为理想二值或者浮值掩蔽估计问题。 近几年来,统计分类模型开始广泛应用于理想二值掩蔽估计。然而,相邻时频单元之间的关联信息还没有得到足够的重视。本文对该关联信息展开深入 研究,并和原分离系统相融合以提高分离性能。主要工作及创新点如下: 对理想二值掩蔽(Ideal Binary Mask, IBM)和理想浮值掩蔽(Ideal Ratio Mask, IRM)在信噪比意义下的近似性讨论。IBM和IRM是语音分离最常用的两个计算目标。由于计算目标的确立是设计一个复杂计算系统的关键问题,我们首先在均方误差及信噪比意义下分析和对比这两种掩蔽模型。根据帕斯瓦尔等式,我们推导出IBM和IRM的均方误差在能谱域的表达方式。在近似联合正交假设下,分析了IRM的近似均方误差。进一步,分析了这两种掩蔽策略在均方误差上的差异,并推导出这二者在信噪比增益指标上差异的近似上界, 即3.01dB。实际分离任务中,该差异比该上界更小,一般小于1dB。 基于融合局域关联模型贝叶斯方法的 IBM估计。主要提出了一个基于时频分割的IBM自适应先验分布模型和基于局部噪音追踪的噪音先验分布模型。这两个模型和原贝叶斯分类器相融合。由于考虑了局部关联信息,使得后验分布函数是一个高维函数。最后,我们采用马尔科夫链-蒙特卡罗算法逼近IBM的期望。实验表明,该关联模型可以提高IBM估计的准确率和分离语音的信噪比。更进一步,有效抑制了原IBM估计中的离散点,使得IBM估计更加平滑。 基于马尔科夫条件随机场语音听觉谱模型的IRM估计。由于IBM采用非是即否的掩蔽策略,IBM估计错误会直接导致听觉谱上相邻时频单元间语音能量的巨变。这和语音听觉谱连续性和慢变性相矛盾。我们提出了一个基于马尔科夫条件随机场的听觉谱先验模型。基于该先验模型,我们采用ICM(Iterated Conditional Modes)算法对二值掩蔽估计对应的听觉谱进行平滑。最后,我们采用浮值掩蔽策略恢复语音信号。实验表明,该平滑算法可以提高分离性能,尤其是更有效地抑制Artefacts噪声。 提出了语音分离模型在信噪比意义下的最优浮值掩蔽(ORM)。最大化信噪比等价于最小化均方误差。同样,基于帕斯瓦尔等式,最小化均方误差转化为一个凸优化问题。在期望的意义下,ORM相对于IRM可以将均方误差降低一半,信噪比增益提高3.01dB。语音质量评估实验表明,ORM还可以显著提高感知质量。该项工作可以看作对计算目标分析的延伸。分析结果还表明,可以采用类似于IRM估计的方法估计ORM,即先估计二值掩蔽然后基于听觉谱局域关联模型泛化到浮值掩蔽。ORM估计是我们下一步工作的重点之一。
其他摘要Separating target speech signal from background noises is one of the key problems in speech processing. Computational auditory scene analysis is one promising approach to this problem. In most of the present speech separation systems, the time domain signal is firstly decomposed into time-frequency (T-F) domain due to the non-stationary property of speech. The separation problem can be transformed into the ideal binary or ratio mask estimation task. In recent years, statistical classification models have been widely used for binary mask estimation. However, the correlation information between adjacent time-frequency units has not received much attention. In this thesis, we studied the local correlation information and integrating it with the original systems to improve the performance. The main works and innovations include: Discussing the approximate property of the ideal binary mask(IBM) to the ideal ratio mask (IRM) in the signal-to-noise ratio(SNR) sense. The IBM and IRM are two commonly used computational goals in speech separation. Since the computational goal is one of the key problems in designing a complex system, we firstly analyze and compare the two mask models in the mean square error and SNR senses. According to Paseval's equality, the presentations of the mean square error(MSE) in the T-F domain corresponding to the IBM and the IRM are derived. Under approximate W-Disjoint Orthogonality assumption, we analyze the MSE of the IRM. Then,the difference between the two masks is analyzed. We further find that the upper bound of the difference in signal-to-noise ratio (SNR) is approximately equal to 3.01dB. In practice separation, the difference is usually smaller than 1dB. Integrating the local correlation model with Bayes classification for the IBM estimation. We propose a time-frequency segmentation based adaptive prior model of the IBM and a local noise tracking based prior model of noise energy. Then, the two prior models are integrated with the original Bayes classification. Due to the local correlation information, the posterior distribution is a high-dimension function. Finally,we use Markov Chain Mento Carlo algorithm to approach the expectation of the IBM. Experiments show that the local correlation model could improve the accuracy of the IBM estimation the SNR gain of the separated speech. Further more, since some outliers in the IBM estimation is suppressed, a more smooth estimation is obtained. Markov random field based sp...
馆藏号XWLW1871
其他标识符201018014628046
语种中文
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/6556
专题毕业生_博士学位论文
推荐引用方式
GB/T 7714
梁山. 基于听觉谱局域关联建模的语音分离方法研究[D]. 中国科学院自动化研究所. 中国科学院大学,2013.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[梁山]的文章
百度学术
百度学术中相似的文章
[梁山]的文章
必应学术
必应学术中相似的文章
[梁山]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。