This thesis proposed our novel statistical approaches on Chinese word analysis, utterance segmentation and automatic evaluation of machine translation (MT). Word analysis is the first step for most application based on Chinese language technologies; utterance segmentation is the bridge which connects speech recognition and text translation in a speech translation system; automatic evaluation of machine translation (MT) system can speed the research and development of a MT system, reduce its developing cost. In short, the three aspects all belong to the basic research area of Natural Language Processing (NLP) and have significant meaning to many important applications such as text translation, speech translation and so on. In Chinese word analysis, we proposed a novel unified approach based on HMM, which efficiently combine word segmentation, Part of Speech (POS) tagging and Named Entity (NE) recognition. Our first model is a class-based HMM. So as to increase its accuracy, we introduce into the word-based HMM and combine it with the class-based HMM. At last we used a "word-to-character" smoothing method for predicting the probability of those words which don' t occur in the training set. The experimental results show that our combined model, by comprehensively considering the information of Chinese characters, words, POS and NE, achieved much better performance in the precision and recall of the Chinese word segmentation. Based on the knowledge of our combined model, we described the details in implementing the general word segmentation system APCWS. We discussed some technical problems in the data saving and loading, and described our modules of knowledge management and word lattice construction. In utterance segmentation, this paper proposed a novel approach which was based on a bi-directional N-gram model and Maximized Entropy model. This novel method, which effectively combines the normal and reverse N-gram algorithm, is able to make use of both the left and right context of the candidate site and achieved very good performance in utterance segmentation. We conducted experiments both in Chinese and in English. The results showed the effect of our novel method was much better than the normal N-gram algorithm. Then by analyzing the experimental results, we found the reason why our novel method achieved better results: it on one hand retained the correct segmentation of the normal N-gram algorithm, on the other hand avoided the incorrect segmentation by making use of reverse N-gram algorithm. In automatic evaluation of MT systems, we first introduced two classic methods on automatic evaluation which relied on reference translations. Then we proposed our novel sentence fluency evaluation method based on N-gram model. This method, called as E3, doesn't need any reference translations and achieved very well evaluation performance by discrimi
修改评论