CASIA OpenIR  > 毕业生  > 博士学位论文
基于听觉感知特性的麦克风阵列语音增强算法研究
其他题名Research on the Microphone Array Speech Enhancement Algorithms Based on the Perceptual Properties of Human Ears
程宁
学位类型工学博士
导师刘文举
2009-05-30
学位授予单位中国科学院研究生院
学位授予地点中国科学院自动化研究所
学位专业模式识别与智能系统
关键词语音增强 麦克风阵列 后滤波器 信号子空间 多统计分布 听觉感知特性 Speech Enhancement Microphone Array Post-filter Signal Subspace Multi-statistical Distributions Perceptual Properties Of Human Ears
摘要现实环境中的语音常常受到噪声的影响。麦克风阵列语音增强作为消除噪声提高语音质量的一种有效手段,尽管近年来取得了长足的进步,但仍不能完全满足实用的要求。本文的工作主要研究了麦克风阵列语音增强算法中的两个算法:后滤波语音增强算法和信号子空间算法。后滤波算法的重点在于后滤波器的准确估计。信号子空间算法的重点在于合理的子空间维度选择和线性滤波器估计。本文利用人耳的听觉感知特性对于这些重点进行了研究和改进。主要工作和创新点如下:  根据麦克风阵列上接收到的信号的相关性,将噪声信号分为相关性噪声和不相关性噪声。在此基础上,分析了各个阵元上信号的自功率谱和阵元间信号的互功率谱的特性,给出了一种后滤波器的估计方法。接着通过将后滤波器表达为矩阵形式,结合人耳的听觉掩蔽效应,进一步改进了该后滤波器,使得该后滤波器的语音增强效果进一步得到了提高。  针对传统的信号子空间算法中用阈值确定子空间维度不准的问题,利用噪声子空间上的特征值应该相等的特点,提出一种更为准确的子空间选择方法。提出了在噪声子空间上用条件概率估计噪声功率谱的方法。利用人耳的听觉掩蔽效应合理的估计了线性滤波器中拉格朗日乘子的值,给出了线性滤波器的一种更为有效的估计方法。  用高斯、拉普拉斯和Gamma分布描述了麦克风阵列上接收到的语音和噪声的分布情况,提出了极大化语音存在概率来估计信号子空间维度的方法,根据语音的存在概率估计出了噪声功率谱。利用人耳的听觉掩蔽效应,合理的折中了增强语音中的语音畸变和残余噪声,给出了一种合理有效的后滤波器估计方法。 基于听觉掩蔽效应的麦克风阵列语音增强算法的研究是本文工作的重点和亮点。本文所提的几种改进算法比传统的算法在消除噪声和减少语音畸变两方面都有明显的改进,达到了预期的研究目的。
其他摘要The speech in real environments is usually interfered by the noise. Although microphone array speech enhancement algorithms have been much studied to reduce the noise and enhance the speech quality, they can not completely meet all the application requirements. The main work of this thesis is to do the research about two microphone array speech enhancement algorithms: the post-filter algorithm and the signal subspace algorithm. The key of the post-filter algorithm is the estimation of the post-filter and the core of the signal subspace algorithm includes the subspace selection and linear filter estimation. This thesis solves these key problems based on the masking properties of human ears. The main contributions and novelties include:  The noise on the microphones is divided into two categories: correlated noise and uncorrelated noise. We analyze the characters of the auto- and cross-power spectrum of the received signals on the array and give a novel post-filter. Then, the post-filter is expressed in matrix style and the masking properties of human ears are incorporated to improve the post-filter. It results in the further improvement of the post-filter performance.  The conventional thresholds based subspace selection method is not accurate. Based on the characters that the eigenvalues in noise subspace should be equal, we propose a better subspace selection method. The noise power spectrum is estimated by the conditional probability in noise subspace. We use the masking properties of human ears to estimate the values of Lagrange multipliers and a novel linear filter is proposed.  We use the Gaussian, Laplacian and Gamma models to describe the distributions of the speech and noise. A new subspace selection method is proposed by maximizing the existence probability of the speech. The noise power spectrum is estimated according to the speech existence probability. The masking properties of human ears are used to balance the speech distortion and the residual noise and a novel post-filter is proposed. The research of microphone array speech enhancement algorithms based on the masking properties of human ears is the highlight of this thesis. The proposed algorithms in this thesis are much better than the conventional algorithms on both aspects of speech distortion reduction and noise reduction. The expected targets are achieved.
馆藏号XWLW1366
其他标识符200618014628039
语种中文
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/6192
专题毕业生_博士学位论文
推荐引用方式
GB/T 7714
程宁. 基于听觉感知特性的麦克风阵列语音增强算法研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2009.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
CASIA_20061801462803(3399KB) 暂不开放CC BY-NC-SA请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[程宁]的文章
百度学术
百度学术中相似的文章
[程宁]的文章
必应学术
必应学术中相似的文章
[程宁]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。