知识引导的汉语语音识别搜索算法研究

CASIA OpenIR > 毕业生 > 博士学位论文

	知识引导的汉语语音识别搜索算法研究
其他题名	Knowledge Guided Decoding Algorithms for Mandarin Speech Recognition
	杨占磊
	2012-05-30
学位类型	工学博士
中文摘要	语音识别搜索算法，或称解码算法，是语音识别系统的核心之一，直接决定系统的性能。它利用声学模型、语言模型及更高层的语法语义等知识，在由发音词典构成的搜索空间中寻找最优词序列。解码算法涉及路径的扩展、打分、合并、剪枝等操作，复杂度较高，因此对解码时间具有很重要的影响。但实际上，解码中包括一部分路径扩展在内的很多操作是不必要的，即使执行扩展，生成的新路径也会因为得分过低在剪枝阶段被删除。因此，研究如何利用相关知识减少不必要的解码操作，对于提高识别速度具有很重要的意义。另外，虽然传统解码算法使用的声学模型与语言模型从不同层次刻画了语音学及语言学的一部分规律，但是，单一的信息源对于语音本质的刻画远远不够，而融合了互补的多源信息的识别系统能够有效降低识别错误率。本论文通过挖掘声学模型及语言模型之外的语音相关知识，并将其用于指导搜索过程，达到或者减少盲目搜索，或者提高搜索准确性的目的，最终改善解码速度及精度。主要工作如下： 1）本研究首先从搜索空间构建及搜索策略的角度分析了当前主流识别系统。然后，我们采用基于词树重入的搜索空间构建方法及基于帧同步的Viterbi束搜索策略，搭建了汉语大词汇量连续语音识别解码系统。作为开展后续研究的实验平台，我们对解码器中采用的路径扩展方法进行了详细描述，并给出了系统在大词汇量连续语音识别任务下的性能。 2）在解码时，根据路径扩展在HMM中发生的位置，可以将扩展划分为HMM内部扩展及HMM之间扩展两种模式。本研究通过分析及实验证明了HMM间扩展比HMM内扩展具有更高的复杂度，并在此基础上提出在语音的分频带能量稳定区域只执行HMM内扩展，而不执行HMM间扩展。结果显示，融合了语音帧的发音稳定性信息的解码系统能够有效降低全部扩展中HMM间扩展的比例，但有效HMM间扩展不会减少，从而不会因限制扩展带来解码错误。实验结果显示，所提算法的解码实时性比基线系统提高22.1%。在相同的解码时间下，相对错误率下降5.24%。 3）为了利用语音帧在声学特征空间中的位置信息，本研究提出一种基于引导概率的语音识别解码算法。我们首先统计了通用背景模型中各高斯成分与音素间的对应关系，得到引导概率，并将之与传统的声学概率和语言模型概率融合。使用引导概率后，解码器更强调对声学特征空间中最有希望的局部进行精细搜索，保留并扩展通过此局部空间的路径，同时弱化不经过此局部空间的路径。之后，本研究从路径得分层次分析了引导概率在扩展及剪枝过程中的作用，并考察了不同的UBM训练及归一化方法、引导概率权重、主高斯数量对系统性能的影响。实验结果显示，基于引导概率的解码算法与基线系统相比，汉字相对错误率下降10.95%。另外，本研究从路径总概率计算的角度重新解释了基于发音稳定段的解码算法，并将其与基于引导概率的解码算法融合，以便于同时利用语音帧的稳定性信息及语音帧在声学特征空间的位置信息。结果显示，融合后的系统性能优于只使用单一信息源的解码系统。 4）本研究采用混合高斯分布之间的距离刻画了不同音素间的相似性，以及音素本身的大小，用于说明说话人...
英文摘要	Decoding algorithm, also known as search algorithm, is one of the central modules of automatic speech recognition (ASR) system. It directly determines the performance of ASR system. The algorithm looks for the optimal word sequences within search space, by taking advantage of the knowledge of acoustic model (AM), language model (LM) and so on. Generally, the decoder has to deal with complicated path manipulations, such as extension, scoring, merging, pruning, and so on, which often consume a lot of time. In fact however, some of these manipulations are unnecessary. Take the path extension as an example, even if paths were extended, the generated new paths are very likely to be pruned immediately because of low probabilities. Therefore, avoiding unnecessary manipulations by using relative knowledge becomes significantly important. This modification is expected to cut down the decoding time. Besides, although AM and LM of traditional ASR system give their description of speech from distinct aspects, it is far from enough to discover the nature of speech by the two knowledge resources, and additional complemented information is appreciated in order to reduce the error rate of ASR system. This paper proposes couples of algorithms to direct the decoding process, by takeing advantage of speech relative knowledge in addition to traditional AM and LM. Thus, the decoding speed or accuracy can be improved by either reducing blind search oporations or increasing effective search oporations. This research focouses on the following aspects. 1) This study first analizes search space construction methods and search strategies of current mainstream ASR systems. Then, by using lexicon reentry based search space construction method and frame synchronous based Viterbi beam search strategy, we build a decoding system for Mandarin large vocabulary continous speech recognition (LVCSR). As the baseline of the following experiments, we give a detailed discription of the decoder, followed by its performance on Mandarin LVCSR task. 2) When decoding, various path extensions can be classified into two patterns, i.e. intra-HMM extesnion and inter-HMM extension, according to positions where the extensions happen. This study compares comlexities of the two patterns and find that the inter-HMM occupies the majority of decoding time. Therefore, this study proposes a novel algorithm by restricting inter-HMM extension at enengy steady frames of speech. By incorporating the steady infoma...
关键词	语音识别解码算法路径扩展概率融合 Speech Recognition Decoding Algorithm Path Extension Probability Fusion
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6455
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	杨占磊. 知识引导的汉语语音识别搜索算法研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2012.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20091801462806（1511KB）			暂不开放	CC BY-NC-SA