In recent years, the statistical machine translation (SMT) has become more dominant along with the development of machine translation (MT) from word-based models to phrase-based models. Due to the major problems including global re-ordering, phrase match and partitioning limitation confronted by phrase-based models, syntax-based SMT is gradually becoming an attractive area of MT research. At the same time, MT evaluation, as the twin brother of translation system, has been more and more concerned by researchers in MT area. With the help of evaluation, the translation problems will be classified, generalized and summarized, which can further facilitate the analysis of the factors that restrict the improvement of translation quality, thereby lead to a huge impetus of MT development.. Under the framework of formally syntax-based SMT, the current study utilizes different granularity of linguistic knowledge and achieves the goal of applying syntactic structure in SMT. Meanwhile, the dissertation also discusses the availability of SMT by analyzing the errors of translation system and proposes an automatic evaluation method for spoken language translation. The main contributions of this paper are summarized as follows: 1. Construction and optimization of large scale hierarchical phrase-based (HPB) SMT platform, and development of a hierarchical phrase extraction method based on suffix array. An efficient experimental platform for the study of formal-syntax SMT models and algorithms was constructed in this section. In order to solve the problems of time and space consumption in the process of training translation models, we propose a hierarchical phrase extraction method based on suffix array, through which the training sentences are transferred as chart structures, and marked with the position of the substrings in the light of high efficient search algorithms. Besides, several new technologies are also introduced in decoding module to improve the performance of MT on large-scale training data. 2. Proposal of some strategies for applying different granularity of linguistic knowledge into HPB. The CRF-based chunking method was first introduced into formal-syntax SMT model to set up a statistical machine translation system using hierarchical chunking phrases, which can be regarded as the initial attempt to apply the shallow parsing knowledge into HPB. Then the result of dependency parsing was utilized as syntax knowledge to integrate into HPB. On one hand, the re...
修改评论