Statistical approaches dominated machine translation research in the recent years, and statistical machine translation systems have evolved from word-based models to phrase-based models, and again to syntax-based models, incorporating more contextual and reordering information into the translation model in each step. The hierarchical phrase-based model is particularly popular for taking the advantages of both the conventional phrase-based model and the synchronous context-free grammar rules. We discusses the design and implementation of such a system in this thesis, and investigates possible ways to reduce the explosive redundancy of the system. The contributions of this work are summarized as follows: (1) We implemented a hierarchical phrase-based translation engine. On Chinese-English translation tasks, the translation quality of this engine is significantly higher than that from conventional phrase-based engines. Particularly this engine gives better translation when long-distance reordering is needed. (2) We proposed an algorithm for extracting sweep-line-based hierarchical rules from word-aligned bilingual corpus. The algorithm extracts all hierarchical rules in a scan of the source training sentence, which is easy to implement. And experiment showed its good performance. (3) We analyzed the rule redundancy problem in the hierarchical phrase-based translation model, and proposed a rift constraint that e_ectively reduced the number of rules in the system, and accordingly brought a dramatic reduction in the training and decoding time, with little sacri_ce of translation quality.
修改评论