汉语语言模型在统计机器翻译系统中的应用

CASIA OpenIR > 毕业生 > 硕士学位论文

	汉语语言模型在统计机器翻译系统中的应用
其他题名	The Application of Chinese Language Models for Statistical Machine Translation System
	王韦华
	2009-12-21
学位类型	工学硕士
中文摘要	目前，大多数机器翻译系统使用的是基于统计的方法。其中该方法的主流包括基于短语的系统和基于层次短语系统。语言模型在统计翻译系统中起到非常重要的作用。它使得翻译的结果更加符合目标语言的语法。然而，不同规模不同元数的n元语言模型对不同的系统到底有什么影响，本文做了大量的实验进行了比较和分析。论文的主要工作归纳如下： 1.介绍了基于短语的统计机器翻译系统的整体架构和各个功能模块的实现与优化。其中主要包括了语言模型的训练，翻译模型的训练，短语系统解码器，最小错误率训练和后处理几大模块。 2. 描述了基于层次短语的统计机器翻译系统的原理和实现。 3.介绍了如何对汉英平行语料进行预处理以满足机器翻译系统的需要，获取原始语料到训练翻译模型和统计模型所需要的语料所需要经过的初始加工和深度加工，实现了一个汉英语料预处理平台。 4. 分析了汉语语言模型的规模对统计机器翻译系统的影响。专门研究了汉语语言模型的规模大小，语法元数在两个英汉统计机器翻译系统中的影响。这两个系统分别是基于短语的统计翻译系统和基于层次短语的统计翻译系统。综上所述，本论文面向统计机器翻译在训练语料预处理、系统实现与优化、语言模型的规模对系统的影响等方面做了大量的实验，进行了比较深入的研究，改进了现有实验系统的性能。
英文摘要	At present, the statistical methods including Phrase-based system and Hierarchical-based system in machine translation field is predominant. Language model plays an important role in statistical translation system. It makes the translation fit for grammar of target language. We wonder what the effects of Chinese language models’ scale and n-gram’s dimension in English-Chinese machine translation systems are. So we have done many experiments in this dissertation. The main contributions of this paper are as follows: 1. Study on Phrase-based system’s framework and every functional model. The functional models include language model training, translation model training, decoder, the algorithm of minimum error rate training and post-processing. 2. Describe the implementation of Hierarchical-based statistical translation system. 3. Study on how to process Chinese to English parallel corpus in machine translation system, how to make corpus from original to mature, and developed a form to preprocess corpus. 4. Study on the effects of Chinese language models’ scale and n-gram’s dimension in English-Chinese machine translation systems. Experiments show that for the same language models, hierarchical phrase-based MT system is better than phrase-based MT system, but for the same MT system, Language models’ scale and dimension effects the BLEU value obviously. It is not sure that a larger scale and higher dimension language model has a better result. In general, this paper mainly focuses on the preprocessing of the training data, the implement of machine translation system, the scale of Chinese language models for Statistical Machine Translation Systems, which have greatly improved the translation result.
关键词	统计机器翻译语料预处理汉语语言模型语料后处理 Statistical Machine Translation Training Data Preprocessing Chinese Language Model Data Post-processing
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/7503
专题	毕业生_硕士学位论文
推荐引用方式 GB/T 7714	王韦华. 汉语语言模型在统计机器翻译系统中的应用[D]. 中国科学院自动化研究所. 中国科学院研究生院,2009.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20062801462804（858KB）			暂不开放	CC BY-NC-SA