CASIA OpenIR  > 毕业生  > 硕士学位论文
汉语语言模型在统计机器翻译系统中的应用
Alternative TitleThe Application of Chinese Language Models for Statistical Machine Translation System
王韦华
Subtype工学硕士
Thesis Advisor徐波
2009-12-21
Degree Grantor中国科学院研究生院
Place of Conferral中国科学院自动化研究所
Degree Discipline模式识别与智能系统
Keyword统计机器翻译 语料预处理 汉语语言模型 语料后处理 Statistical Machine Translation Training Data Preprocessing Chinese Language Model Data Post-processing
Abstract目前,大多数机器翻译系统使用的是基于统计的方法。其中该方法的主流包括基于短语的系统和基于层次短语系统。语言模型在统计翻译系统中起到非常重要的作用。它使得翻译的结果更加符合目标语言的语法。然而,不同规模不同元数的n元语言模型对不同的系统到底有什么影响,本文做了大量的实验进行了比较和分析。 论文的主要工作归纳如下: 1.介绍了基于短语的统计机器翻译系统的整体架构和各个功能模块的实现与优化。其中主要包括了语言模型的训练,翻译模型的训练,短语系统解码器,最小错误率训练和后处理几大模块。 2. 描述了基于层次短语的统计机器翻译系统的原理和实现。 3.介绍了如何对汉英平行语料进行预处理以满足机器翻译系统的需要,获取原始语料到训练翻译模型和统计模型所需要的语料所需要经过的初始加工和深度加工,实现了一个汉英语料预处理平台。 4. 分析了汉语语言模型的规模对统计机器翻译系统的影响。专门研究了汉语语言模型的规模大小,语法元数在两个英汉统计机器翻译系统中的影响。这两个系统分别是基于短语的统计翻译系统和基于层次短语的统计翻译系统。 综上所述,本论文面向统计机器翻译在训练语料预处理、系统实现与优化、语言模型的规模对系统的影响等方面做了大量的实验,进行了比较深入的研究,改进了现有实验系统的性能。
Other AbstractAt present, the statistical methods including Phrase-based system and Hierarchical-based system in machine translation field is predominant. Language model plays an important role in statistical translation system. It makes the translation fit for grammar of target language. We wonder what the effects of Chinese language models’ scale and n-gram’s dimension in English-Chinese machine translation systems are. So we have done many experiments in this dissertation. The main contributions of this paper are as follows: 1. Study on Phrase-based system’s framework and every functional model. The functional models include language model training, translation model training, decoder, the algorithm of minimum error rate training and post-processing. 2. Describe the implementation of Hierarchical-based statistical translation system. 3. Study on how to process Chinese to English parallel corpus in machine translation system, how to make corpus from original to mature, and developed a form to preprocess corpus. 4. Study on the effects of Chinese language models’ scale and n-gram’s dimension in English-Chinese machine translation systems. Experiments show that for the same language models, hierarchical phrase-based MT system is better than phrase-based MT system, but for the same MT system, Language models’ scale and dimension effects the BLEU value obviously. It is not sure that a larger scale and higher dimension language model has a better result. In general, this paper mainly focuses on the preprocessing of the training data, the implement of machine translation system, the scale of Chinese language models for Statistical Machine Translation Systems, which have greatly improved the translation result.
shelfnumXWLW1460
Other Identifier200628014628047
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/7503
Collection毕业生_硕士学位论文
Recommended Citation
GB/T 7714
王韦华. 汉语语言模型在统计机器翻译系统中的应用[D]. 中国科学院自动化研究所. 中国科学院研究生院,2009.
Files in This Item:
File Name/Size DocType Version Access License
CASIA_20062801462804(858KB) 暂不开放CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[王韦华]'s Articles
Baidu academic
Similar articles in Baidu academic
[王韦华]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[王韦华]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.