机器翻译系统融合方法研究与实现

CASIA OpenIR > 毕业生 > 博士学位论文

	机器翻译系统融合方法研究与实现
其他题名	Research and Implementation on System Combination for Machine Translation
	李茂西
	2011-05-26
学位类型	工学博士
中文摘要	近年来，统计机器翻译技术得到了快速的发展，许多基于不同范式的统计翻译模型被相继提出，如基于短语的统计翻译模型、基于句法的统计翻译模型等等。每一种模型都有它自身的优点和弱点，如何通过有效的方法从多个机器翻译引擎的输出结果中抽取出有用信息，融合多个翻译引擎的优点，得到最终的高质量翻译结果，对于提高整个机器翻译系统的译文质量至关重要。因此，开展机器翻译系统融合方法的研究具有重要的理论意义和应用价值。本论文在词汇级别系统融合的框架下，以汉英和英汉机器翻译系统为实验平台，对系统融合方法进行了深入研究和实践。论文的主要工作和创新成果归纳如下： 1. 提出并实现了一种基于词调序的翻译假设对齐方法单语言句子对齐是词汇级别系统融合方法中重要的一步。本论文提出了一种基于词调序的翻译假设对齐方法。该方法不同于现存的基于编辑距离的词错误率（WER）方法和翻译错误率（TER）方法，而是直接将翻译假设中需要调序的语块移动到正确的位置。其基本思路是：首先，寻找翻译假设间所有公共的连续语块并用变量将它们分别进行替换；然后，对精简后的翻译假设进行局部对齐并查找交叉对齐，根据交叉对齐情况，把翻译假设中的词序调整到正确的位置；最后，使用动态规划算法对齐两个词序一致的翻译假设。在新闻翻译领域和口语翻译领域的实验表明，该方法能够显著地提高译文质量。 2. 对比研究了基于不同融合层次的系统融合方法在机器翻译中，有三种不同层次的系统融合方法，包括句子级别系统融合方法、短语级别系统融合方法、词汇级别系统融合方法。本论文分别在口语翻译领域和新闻翻译领域的语料上，比较了这三种不同层次的系统融合方法的性能。实验结果表明，词汇级别系统融合方法能深层次的融合各种语言知识，对翻译质量的提高最大，同时融合性能最稳定。 3. 对比研究了基于汉字和基于词汇的汉语译文质量评价方法及系统融合方法在中文信息处理中，词通常被看作一个基本的处理单元。然而，汉字同样可以作为基本的处理单元。在机器翻译的译文自动评价和系统融合任务中，目前还没有相关工作来比较基于词的方法和基于汉字的方法的性能。为此，本论文研究了汉语译文的自动评价和系统融合方法。实验表明，基于汉字的自动评价方法和人工评价之间的相关性好于基于词的自动评价方法。对于汉语译文的融合，基于汉字的方法统计显著地优于基于词的方法。论文对该现象进行了深入的分析。 4. 实现了基于Web服务的在线统计机器翻译系统结合Web服务网络通讯方式的优势和统计机器翻译系统的特点，本论文建立了基于Web服务的在线统计机器翻译系统。该系统通过Web服务技术来实现远程终端客户机和本地Web服务器之间的网络通讯，并通过网络套接字来实现Web服务器和机器翻译服务器之间翻译文本信息的发送和接收。实验表明，该方法能有效地组织多台统计机器翻译引擎，使它们协调工作，并能改善在线统计机器翻译系统的响应速度和用户并发数。综上所述，本论文提出了一种基于词调序的翻译假设对齐方法、在多个语料上分析比较了不同级别系统融合方法的性能，分析研究了汉语译文质量评...
英文摘要	Over the past decade, machine translation has been greatly developed. A variety of different para-digms for statistical machine translation (SMT) have been proposed, including phrase-based SMT model, syntax-based SMT model etc. Each model has its strength and weakness. It would be a very meaningful work to integrate the advantages of multiple translation engines and overcome their shortcomings. The combination of machine translation system extracts useful information from the outputs of multiple machine translation engines to get the final consensus translation. It has been widespread concern as an effective way to improve the quality of machine translation. Thus, the research on machine translation system combination has important theoretical and practical value. Under the framework of word-level system combination, and taking the Chinese-to-English and English-to-Chinese machine translation systems as the experimental platform, we deeply study the methods of system combination and put it into practice. The major contributions are listed as follows: 1. We present a new approach to word reordering alignment and put it into practice The alignment between paired monolingual sentences is an important process for word-level system combination. We present a word reordering alignment (WRA) approach for combination of SMT systems in this paper. Different from the previous approaches based on edit distance, such as WER or TER, our WRA approach directly shifts the word sequences of the translation hypothesis to the correct location within the translation hypothesis. In our approach, the continuous word sequences are first found and replaced by some variables. Then we align the variables and words identical to each other in the two sentences and detect the cross alignment that should be reordered. According to the cross alignment, the detected word sequences are shifted to the correct position and dynamic programming are exploited to align the sentences after reordering. The experiments on newswire translation domain and spoken language translation domain show that the approach can significantly improve the translation quality. 2. We compare the diffent approaches to system combination There are three different levels of systems combination methods for machine translation, including sentence-level system combination, phrase-level system combination, and word-level system combi-nation. To compare different system combination approaches, we conduct experiment on s...
关键词	机器翻译系统融合译文自动评估 Web 服务混淆网络 Machine Translation System Combination Automatic EvaluatiOn On Machine translatiOn Web Services Confusion Network
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6345
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	李茂西. 机器翻译系统融合方法研究与实现[D]. 中国科学院自动化研究所. 中国科学院研究生院,2011.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20071801462804（1374KB）			暂不开放	CC BY-NC-SA