CASIA OpenIR  > 毕业生  > 博士学位论文
鲁棒性语音识别中模型适应技术的研究
其他题名The Research of the Model Adaptation in Robust Speech Recognition
缪彩练
学位类型工学博士
导师王阳生
2003-07-01
学位授予单位中国科学院研究生院
学位授予地点中国科学院自动化研究所
学位专业模式识别与智能系统
关键词Pmc技术 增强技术 最大似然估计 残差噪声模型 Parallel Model Combination Enhancement Technique Maximum Likelihood Residual Noise Model
摘要语音识别的鲁棒性技术是为解决由于测试环境与训练环境之间的声学失配而引起识别性能恶化问题,是当今语音技术发展的一个重要方向,其中以并行模型组合技术(PMC)为代表的模型适应技术在鲁棒性技术中占有重要位置。本论文在分析了PMC技术的基本原理后,针对PMC技术的局限性等有待改进的方面,提出了针对性解决办法,包括:卷积噪声的最大似然估计求解方法;对合成模型动静态矢量采用加权方式处理减少计算量,提高识别率;采用模型分裂组合技术方法和增加干净语音矢量与噪声矢量的相关项来降低假设和近似处理的不准确性。 在对PMC技术进一步深入研究的基础上,论文中创新地提出PMC技术的改进方法:将信号增强(去噪)技术与环境适应技术相综合。对适应数据与测试数据应用信号增强技术作为预处理,将有噪数据尽可能还原成干净语音数据;然后应用模型适应技术,对经增强预处理后的适应数据采用最大似然估计法求估残留的加性与卷积噪声,使合成的模型(称为增强语音模型)更好的匹配经预处理后的测试数据。本文还创新地引入一个新概念:残差噪声模型,将它作为残留的加性噪声和卷积噪声的联合补偿模型,并直接定义于倒谱域上对语音信号的加性作用,这样不仅通过增强处理提高了信号的信噪比,而且使PMC技术整个处理在倒谱域上就完成了,简化和省略了域间转换的过程,克服了传统PMC技术诸多弱点。新的PMC技术可进一步提高识别性能,增强对环境的适应性。 实验是在剑桥大学的HTK语音识别工具包的基础上进行,嵌入了新的PMC算法,可用于对O~9十个中文数字组成的数字串进行连续语音识别。在各种噪声环境下对算法进行了测试,其中包括人工添加噪声以及真实噪声场景,结果表明,新的PMC技术在各种噪声环境下能显著提高识别率。
其他摘要The robust Techniques to improve the bad performance due to the mismatch between training and test environments have become a hotspot in the field of speech recognition. The environment adaptive methods play important part in improving the robustness, for example, PMC (Parallel Mode Combination). In the dissertation, the fundamental principle of PMC is investigated and the limitation is analyzed: the estimation of convolutional noise model remains an open problem; the approximations and assumptions made in the PMC are not accurate enough; computation costs too much; performance degrades in lower SNR, etc, all limit the application of PMC. The solutions are proposed in this dissertation: the convolutional noise models is estimated by Maximum likelihood(ML) manner; the weighted sumrnation of variance vector of noise and speech models produces the variance of combined models, that can reduce the computation costs; the approaches of model-splitting and combining & estimation of the cross-term between the speech and noise can overcome partly the inaccuracy due to the approximations and assumptions. The new PMC is proposed this dissertation: the enhancement approach is used as the pre-processing to restore the noisy speech signal to clean speech as close as possible; then the remained additive & convolutional noise is estimated by ML and the model adaptation method is applied so that the combined models can match the enhanced speech better. In this paper, new residual noise model and enhanced speech models are introduced into PMC. The advantage of the modified PMC is: the SNR is raised by enhancement, the residual noise is the joint bias compensation for additive and convolute noise; the bias parameter combinations are performed only one time in cepstral domain without domain transformation, which is based on the new mismatch function. The experiments show that the new PMC can incorporate the signal enhancement with environment adaptation strategies to increase the robustness and improve the performance in noisy conditions. In our experiment, Cambridge's HTK toolkit 3.0 was used as test platform with suitable modification embedding PMC algorithms implement the continuous Mandarin digit recognition. The training data were collected in clean office environment while the testing data include the artificial data contaminated by white Gaussian noise at different SNR levels and also the noisy speech collected in real noisy environment. The experiment results were compared to show the effective of PMC.
馆藏号XWLW755
其他标识符755
语种中文
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/5779
专题毕业生_博士学位论文
推荐引用方式
GB/T 7714
缪彩练. 鲁棒性语音识别中模型适应技术的研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2003.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[缪彩练]的文章
百度学术
百度学术中相似的文章
[缪彩练]的文章
必应学术
必应学术中相似的文章
[缪彩练]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。