Machine translation investigates the use of computer to translate text or speech from one language to another. In the era of knowledge economy, increasingly frequent interna-tional communication and continuously quickening globalization process make the mag-nitude of cross-language information exchange rapidly increase. The natural linguistic barrier between different countries or areas is becoming more and more prominent. As a computer technology to conquer language baffle, machine translation is playing a more and more important role in economic development and social life. So far, statistical machine translation has become a mainstream method in machine translation. In statistical translation methods phrase-based models still make an active area of research. However, there are three main problems in phrase-based models which retard their development: lack of robustness in the construction of phrase table; poor abil-ity of contiguous phrases’ generalization and phrases reordering. This thesis puts the re-search emphasis on two subfields of phrase-based statistical machine translation methods, namely methods of phrase extraction and phrase reordering model. The major contributions of this thesis are listed as follows: (1) We propose a flexible-scale-based method of phrase translation extraction. The phrases translation pair extraction is the key technique in phrase-based statistical machine translation. Och’s method of phrase extraction is the most widely applied method, which heavily depends on word alignments and extracts the phrase pairs fully consistent with the word alignments. We propose a method of phrase pair extraction with a flexible scale. It not only makes use of the merit of Och’s method but also extracts those phrase align-ments Och’s method can not obtain. The flexible scale is based on the two features: POS and dictionary information. Our method relaxes the restriction of “total consistency with word alignment” and can find corresponding target phrases for more source phrases. In this way our method can extract more translation information from parallel data and im-porve the translation performance of phrase-based statistical machine translation. (2) We propose a generalized reordering model for phrase-based statistical machine translation which introduces non-contiguous phrases into bracketing transduction gram-mar and increases its capability of phrase generalization. Phrase-based statistical machine translat...
修改评论