In this thesis, we researched the theory and application of language model for Chinese character recognition. Language model plays an increasingly important role in post-processing of Chinese character recognition and it can improve the performance of the whole recognition system. Based on several traditional ones, some new language models, which are practicable were brought forward. Before building training corpus, sample texts with about 4.5 million Chinese characters have been reviewed then we got a dictionary with 3,200 Chinese characters. When those new language models incorporated the corpus and dictionary, they all show good performance. Language mode] has been systematically discussed in the thesis. Before the construction of an N-gram statistical language model, its fundamental mechanism is explained. Then on the base of HMM(Hidden Markov Model), the data smoothing method is analyzed. Finally, we introduce Cache model which can capture the long distance information, and N-class model which is based on POS(Part of Speech). Incorporated the feature of recognition of Chinese character, first, we introduce 5-gram combined model, which can capture both forward and backward statistical characters of one word. In order to reflect the structural feature of every line in test text, secondly, variable length language model is introduced. Compared to previous language model, it realizes the automatic choice of language model that is always constant before. Finally, another language model is introduced to raise recognition rate when there are dense errors in sentences. At the end of the thesis, we discuss the application of language model in predictive system. After categorizing document stream automatically, this recognition system, with a language model intends to predict Chinese character exactly. Automatic categorization can make this model predict intentionally. With this predictive language model, the task of the recognizer can be cut down and the correct rate of the whole system can be raised. All of these language models can be used not only in the post-processing of Chinese character recognition, after some modification they can be used in speech recognition、machine translation and so on also.
修改评论