CASIA OpenIR  > 毕业生  > 博士学位论文
知识引导的汉语语音识别搜索算法研究
Alternative TitleKnowledge Guided Decoding Algorithms for Mandarin Speech Recognition
杨占磊
Subtype工学博士
Thesis Advisor刘文举
2012-05-30
Degree Grantor中国科学院研究生院
Place of Conferral中国科学院自动化研究所
Degree Discipline模式识别与智能系统
Keyword语音识别 解码算法 路径扩展 概率融合 Speech Recognition Decoding Algorithm Path Extension Probability Fusion
Abstract语音识别搜索算法,或称解码算法,是语音识别系统的核心之一,直接决定系统的性能。它利用声学模型、语言模型及更高层的语法语义等知识,在由发音词典构成的搜索空间中寻找最优词序列。解码算法涉及路径的扩展、打分、合并、剪枝等操作,复杂度较高,因此对解码时间具有很重要的影响。但实际上,解码中包括一部分路径扩展在内的很多操作是不必要的,即使执行扩展,生成的新路径也会因为得分过低在剪枝阶段被删除。因此,研究如何利用相关知识减少不必要的解码操作,对于提高识别速度具有很重要的意义。另外,虽然传统解码算法使用的声学模型与语言模型从不同层次刻画了语音学及语言学的一部分规律,但是,单一的信息源对于语音本质的刻画远远不够,而融合了互补的多源信息的识别系统能够有效降低识别错误率。 本论文通过挖掘声学模型及语言模型之外的语音相关知识,并将其用于指导搜索过程,达到或者减少盲目搜索,或者提高搜索准确性的目的,最终改善解码速度及精度。主要工作如下: 1)本研究首先从搜索空间构建及搜索策略的角度分析了当前主流识别系统。然后,我们采用基于词树重入的搜索空间构建方法及基于帧同步的Viterbi束搜索策略,搭建了汉语大词汇量连续语音识别解码系统。作为开展后续研究的实验平台,我们对解码器中采用的路径扩展方法进行了详细描述,并给出了系统在大词汇量连续语音识别任务下的性能。 2)在解码时,根据路径扩展在HMM中发生的位置,可以将扩展划分为HMM内部扩展及HMM之间扩展两种模式。本研究通过分析及实验证明了HMM间扩展比HMM内扩展具有更高的复杂度,并在此基础上提出在语音的分频带能量稳定区域只执行HMM内扩展,而不执行HMM间扩展。结果显示,融合了语音帧的发音稳定性信息的解码系统能够有效降低全部扩展中HMM间扩展的比例,但有效HMM间扩展不会减少,从而不会因限制扩展带来解码错误。实验结果显示,所提算法的解码实时性比基线系统提高22.1%。在相同的解码时间下,相对错误率下降5.24%。 3)为了利用语音帧在声学特征空间中的位置信息,本研究提出一种基于引导概率的语音识别解码算法。我们首先统计了通用背景模型中各高斯成分与音素间的对应关系,得到引导概率,并将之与传统的声学概率和语言模型概率融合。使用引导概率后,解码器更强调对声学特征空间中最有希望的局部进行精细搜索,保留并扩展通过此局部空间的路径,同时弱化不经过此局部空间的路径。之后,本研究从路径得分层次分析了引导概率在扩展及剪枝过程中的作用,并考察了不同的UBM训练及归一化方法、引导概率权重、主高斯数量对系统性能的影响。实验结果显示,基于引导概率的解码算法与基线系统相比,汉字相对错误率下降10.95%。另外,本研究从路径总概率计算的角度重新解释了基于发音稳定段的解码算法,并将其与基于引导概率的解码算法融合,以便于同时利用语音帧的稳定性信息及语音帧在声学特征空间的位置信息。结果显示,融合后的系统性能优于只使用单一信息源的解码系统。 4)本研究采用混合高斯分布之间的距离刻画了不同音素间的相似性,以及音素本身的大小,用于说明说话人...
Other AbstractDecoding algorithm, also known as search algorithm, is one of the central modules of automatic speech recognition (ASR) system. It directly determines the performance of ASR system. The algorithm looks for the optimal word sequences within search space, by taking advantage of the knowledge of acoustic model (AM), language model (LM) and so on. Generally, the decoder has to deal with complicated path manipulations, such as extension, scoring, merging, pruning, and so on, which often consume a lot of time. In fact however, some of these manipulations are unnecessary. Take the path extension as an example, even if paths were extended, the generated new paths are very likely to be pruned immediately because of low probabilities. Therefore, avoiding unnecessary manipulations by using relative knowledge becomes significantly important. This modification is expected to cut down the decoding time. Besides, although AM and LM of traditional ASR system give their description of speech from distinct aspects, it is far from enough to discover the nature of speech by the two knowledge resources, and additional complemented information is appreciated in order to reduce the error rate of ASR system. This paper proposes couples of algorithms to direct the decoding process, by takeing advantage of speech relative knowledge in addition to traditional AM and LM. Thus, the decoding speed or accuracy can be improved by either reducing blind search oporations or increasing effective search oporations. This research focouses on the following aspects. 1) This study first analizes search space construction methods and search strategies of current mainstream ASR systems. Then, by using lexicon reentry based search space construction method and frame synchronous based Viterbi beam search strategy, we build a decoding system for Mandarin large vocabulary continous speech recognition (LVCSR). As the baseline of the following experiments, we give a detailed discription of the decoder, followed by its performance on Mandarin LVCSR task. 2) When decoding, various path extensions can be classified into two patterns, i.e. intra-HMM extesnion and inter-HMM extension, according to positions where the extensions happen. This study compares comlexities of the two patterns and find that the inter-HMM occupies the majority of decoding time. Therefore, this study proposes a novel algorithm by restricting inter-HMM extension at enengy steady frames of speech. By incorporating the steady infoma...
shelfnumXWLW1750
Other Identifier200918014628063
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/6455
Collection毕业生_博士学位论文
Recommended Citation
GB/T 7714
杨占磊. 知识引导的汉语语音识别搜索算法研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2012.
Files in This Item:
File Name/Size DocType Version Access License
CASIA_20091801462806(1511KB) 暂不开放CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[杨占磊]'s Articles
Baidu academic
Similar articles in Baidu academic
[杨占磊]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[杨占磊]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.