Language model plays an irreplaceable role in speech recognition systems. Speech recognition technology is evolving and becomes practical. To fit more and more extensive areas and complex background, high-performance language models are required to reduce the search space, thereby enhancing the recognition speed and accuracy. But in some limited domains, it is difficult to get enough corpus to train the language model. In that case, we should train the model depending on the features of the corpus. The work of this paper is as follows: We learned the current situation of language modeling and then studied some algorithms of language modeling. Then we tried to build Chinese language models: first preprocessing the corpus, including special characters processing, Chinese word segmentation, new word detection, and then used certain algorithm to build Chinese language models. We proposed a novel interpolated language model that combines the interpolation and the backing-off along hierarchical classes based on class hierarchy. We used a cluster-tree to balance the generalization ability of classes and word specificity when estimating the likelihood of an n-gram event, and used interpolation to apply lower order gram information.. We presented a new framework of language model adaptation based on modification of structures of background corpus and language model. The widely used adaptation approach such as Linear Interpolation Method (LI) and Minimum Discrimination Information (MDI) method are used as the approaches to modify structure of trained background language model in new framework, while Maximum A Posteriori approach (MAP) is used as the method of modifying structure of background corpus. Experiments were shown that both techniques in the framework yield a significant reduction in perplexity over LI and MDI method in general adaptation framework about 5.2%, and 36.8% respectively. We attempted to build language model for small, domain-dependent but charactless Chinese corpus. First, we extended the corpus artificially, and then used LI method in the new adaptive framework proposed. The word error of speech recognition was reduced by 0.7%(absolute point).
修改评论