Under the framework of multiple system combination, this paper mainly analyzes and does research on some key problems such as Chinese-English bilingual corpus processing and optimization, multi-engine platform construction and phrase-based model optimization. Meanwhile, this paper also proposes many related solutions and makes plenty of experiments to verify their effectiveness. The main contributions of this paper are as follows: 1. Study on Chinese-English bilingual corpus construction and realization, and propose a content-based optimization method for bilingual corpus processing 2. Study on multi-engine SMT platform construction and realization, and propose some strategies for phrase-based model optimization and common modules optimization. We construct a multi-engineer experimental platform for research on SMT models and algorithms. Meanwhile, it also could be used as a transferring platform for application development. In the optimization of phrase-based model, we focus on phrase extraction and probability computing optimization. 3. Propose a local prediction re-ordering model based on relative position vector for phrase-based system The major problem of phrase-based SMT is phrase re-ordering. This paper proposes a prediction model based on phrase relative positions and orientations. 4. Propose the framework of multiple system combination based on Confusion Network decoding The proposed framework is based on word-level combination, and uses the Minimum Bayes Risk decoding and Confusion Network decoding techniques. We add the word posterior, language model, POS language model and word penalty as the features into a log-linear model, and then search a best path to output by beam search technique.
修改评论