英文摘要 | Decoding algorithm, also known as search algorithm, is one of the central modules of automatic speech recognition (ASR) system. It directly determines the performance of ASR system. The algorithm looks for the optimal word sequences within search space, by taking advantage of the knowledge of acoustic model (AM), language model (LM) and so on. Generally, the decoder has to deal with complicated path manipulations, such as extension, scoring, merging, pruning, and so on, which often consume a lot of time. In fact however, some of these manipulations are unnecessary. Take the path extension as an example, even if paths were extended, the generated new paths are very likely to be pruned immediately because of low probabilities. Therefore, avoiding unnecessary manipulations by using relative knowledge becomes significantly important. This modification is expected to cut down the decoding time. Besides, although AM and LM of traditional ASR system give their description of speech from distinct aspects, it is far from enough to discover the nature of speech by the two knowledge resources, and additional complemented information is appreciated in order to reduce the error rate of ASR system. This paper proposes couples of algorithms to direct the decoding process, by takeing advantage of speech relative knowledge in addition to traditional AM and LM. Thus, the decoding speed or accuracy can be improved by either reducing blind search oporations or increasing effective search oporations. This research focouses on the following aspects. 1) This study first analizes search space construction methods and search strategies of current mainstream ASR systems. Then, by using lexicon reentry based search space construction method and frame synchronous based Viterbi beam search strategy, we build a decoding system for Mandarin large vocabulary continous speech recognition (LVCSR). As the baseline of the following experiments, we give a detailed discription of the decoder, followed by its performance on Mandarin LVCSR task. 2) When decoding, various path extensions can be classified into two patterns, i.e. intra-HMM extesnion and inter-HMM extension, according to positions where the extensions happen. This study compares comlexities of the two patterns and find that the inter-HMM occupies the majority of decoding time. Therefore, this study proposes a novel algorithm by restricting inter-HMM extension at enengy steady frames of speech. By incorporating the steady infoma... |
修改评论