With the development of technology and society, the process of globalization speeds up and the international exchange is more frequent than before. It is more important to break the language barrier between different countries and regions. Machine translation, as an effective solution to solve this problem, has been paid more attentions from researchers and gets rapid development in the recent years. Machine translation is a technology which automatically converts the sentence from one language to another language using the computer. Currently, the statistical machine translation (SMT) method is the focus of research. The machine translation method has got great progress after several decades of development, but there are still many theoreti-cal problems and technical problems remained to be solved. The full automatic high quality translation (FAHQT) is still difficult to obtain. In this thesis, we focus on the problems of translation of out-of-vocabulary (OOV) words and unknown phrases in spo-ken language translation (SLT), and the data selection methods for statistical machine translation. The study has important theoretical significance and application value. The main work and contributions in this thesis are summarized as follows: (1) We propose and implement the interactive translation method for OOV words. OOV words are common phenomena in spoken language, and often are the key points in a sentence. Wrong translation of the OOV words will greatly affect the performance of the translation system. But this problem is hard to be solved by the machine automatically because of the data sparseness problem. So we propose the interactive translation method for OOV words. First, we detect the boundary of OOV word by interaction with human. Second, we use the classifier to determine the category of the OOV word. Third, we translate the OOV word using the corresponding translation module to generate the candidate translations. At last, the human decide which candidate translation is correct. The successful translation of OOV word will be saved to memory base, and the system could deal with the same problem automatically next time. The quality of translation is improved for the knowledge of OOV word provided by human. (2) We propose and implement the interactive translation method for unknown phrases. For the phrase-based SMT system use the exactly match policy in decoder, many phrases couldn’t find the exactly match in the phrase table and become unknown p...
修改评论