CASIA OpenIR  > 毕业生  > 博士学位论文
噪声环境下的鲁棒语音识别研究
其他题名Research on Noise Robust Speech Recognition
贾川
学位类型工学博士
导师徐波
2003-10-01
学位授予单位中国科学院研究生院
学位授予地点中国科学院自动化研究所
学位专业模式识别与智能系统
关键词噪声环境下的语音识别 端点检测 语音增强 语音特征增强(补偿) 模型补偿 Noise Robust Speech Recognition Endpoint Detection Speech Enhancement Speech Feature Enhancement(Compensation) Model Compens
摘要随着语音识别技术日益成熟,语音识别系统开始步入实用,因而如何提高语音识别系统在背景噪声环境下的性能成为识别系统走向实用的关键问题之一。本论文在总结和分析现有的针对噪声鲁棒识别问题的算法的基础之上,依据噪声在信号、特征和模型空间对语音的影响,在端点检测、语音增强、语音特征增强、语音模型补 偿和特征补偿的联合等方面做了大量的研究工作: 一.端点检测在语音识别中有很重要的应用。本文对语音谱熵特征进行了深入的研究,提出引入常数到其中的概率密度函数的计算形式中,得到改进的语音谱熵特征,并且提出了相应的端点检测策略。改进的谱熵能够更容易地区分语音和噪声信号,而且,在不同信噪比下引入不同的常数,使得改进的谱熵几乎不受信噪比变化的影响,从而门限更易于设定和调节。大量实验表明该端点检测算法大大改善了基本的谱熵的性能,端点检测的准确程度大大高于传统的基于能量的端点检测方法。二.语音增强算法可以有效提高语音的感知质量和可懂度。本文分析了基于ARHMM模型的最大后验估计算法在低信噪比下的缺陷,提出结合码本限制的维纳滤波算 法,来限制原有算法中的混合维纳滤波器,使其满足以码本描述的某些声道谱的限制条 件。本文提出的算法框架在输出信噪比、感知质量等方面都取得了一定的提高。将该语音 增强算法作为语音识别器的前端处理,也可以提高语音识别系统性能。 三.语音特征增强或补偿算法的目的是净化语音使其与训练环境匹配,从而提高识别 性能。本文假定由加性噪声引起的环境不匹配可以由功率谱域的加性偏差表示。由于偏差 和噪声功率谱之间的对应关系,本文提出在最大后验框架下,引入噪声先验知识到偏差的 估计过程中。而且,由于大多数噪声是非平稳的,不仅需要跟踪加性谱偏差的变化,还要 实时更新噪声的统计特性。因此,本文在最大后验框架下,利用基于Kullback-Leibler信息 度量的序贯估计技术自适应地估计谱偏差和更新噪声先验分布的参数,从而实现语音谱特 征的增强。初步的语音识别实验表明本文提出的算法优于序贯的最大似然估计方法,而且 在非平稳噪声环境下明显优于批处理的方法。 四.语音模型补偿算法的目的是使得自适应之后的语音模型与训练环境相匹配。本文 为了提高系统在非平稳噪声环境下的性能,综合利用模型补偿方法和特征补偿方法各自的 优点,提出在这两个空间内联合补偿非平稳的噪声。本文将非平稳噪声分解为常量部分
其他摘要Along with the great progress made in the state of the art of speech recognition technology, the speech recognition system is deployed in commercial application recently. One of the most important problems is to compensate for the negative effects of noise in the performance of ASR under noise environments. Based on the summarization and analyses on kinds of algorithms for noise robust ASR, and according to the influence of noise on speech signal, feature and model spaces, I tried to investigate the relative research on endpoint detection, speech enhancement, speech feature enhancement, the combination of model compensation and feature compensation. Endpoint detection is important for ASR. Based on the deep investigation on speech spectral entropy, I proposed to alter the spectral probability density function of entropy by the introduction of a positive constant, and established the endpoint detection strategy. The obtained spectral entropy improves the discriminability between speech and noise greatly. Moreover, the enhanced spectral entropy is not almost subject to the change of signal to noise ratio(SNR) due to the introduced different constant at different SNR. Then the thresholds are very easy to set and tune. Its superiority over basic spectral entropy and conventional energy-based approach was evidenced by many experiments. The speech perceived quality and speech intelligibility can be improved by speech enhancement algorithms. Based on the analysis on the deficiency of the maximum a posteriori(MAP) estimator based on AR-HMM's at low SNR, I pro- posed to incorporate codebook constrained Wiener filter into MAP framework to impose spectral constraints on the harmonic of Wiener filters so as to satisfy some vocal track constraints described by code-book. The objective measures, output SNR and Itakura-Saito distortion measure, verified the quality improvement of the proposed method. As a preprocessor for ASR, it can also improve the recognition accuracy rate. The speech feature enhancement and compensation is to clean speech features to make them matched with training environment. The environment mismatch due to additive noise is assumed as an additive bias in power spectral domain. Due to the internal relationship between bias and noise power spectra, it is valuable to introduce the noise priori knowledge into bias estimation process by using MAP criterion. Moreover, the mismatch is usually nonstationary in real application, so it is necessary to track time varying additive bias and update the statistical characteristic adaptively. Thus, I proposed to use the sequential techniques based on Kullback-Leibler information measure to estimate additive bias and update the parameters of noise priori in MAP framework sequentially. Speech recognition experiments demonstrated that the proposed algorithm outperformed sequential maximum likelihood estimation method and was obviously better than the ba
馆藏号XWLW800
其他标识符800
语种中文
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/5787
专题毕业生_博士学位论文
推荐引用方式
GB/T 7714
贾川. 噪声环境下的鲁棒语音识别研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2003.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[贾川]的文章
百度学术
百度学术中相似的文章
[贾川]的文章
必应学术
必应学术中相似的文章
[贾川]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。