CASIA OpenIR  > 毕业生  > 硕士学位论文
Alternative TitleThe Design and Implementation of A Hierarchical Phrase-Based Statistical Translation System
Thesis Advisor宗成庆
Degree Grantor中国科学院研究生院
Place of Conferral中国科学院自动化研究所
Degree Discipline模式识别与智能系统
Keyword统计机器翻译 层次短语 短语提取 解码 规则冗余 Statistical Machine Translation Hierarchical Phrase-based Model Phrase Extraction Decoding Rule Redundancy
Abstract机器翻译是自然语言处理中研究的一个重要课题。近年来统计机器翻译的研究占据了机器翻译研究的主流地位。而且统计机器翻译经历了从词到短语,从使用表层字符串信息到使用句法结构信息的演化。在这个过程中的每一步,都试图向翻译模型中纳入更多的上下文信息或重排序信息,以获得翻译质量的提升。基于层次短语的翻译模型是一种效果较好的翻译方法,它结合了基于短语的翻译模型和同步上下文无关文法的优点。本文研究了基于层次短语的翻译引擎的设计和实现方法,并对其规则冗余问题进行了探讨。 本文的主要工作归纳如下: (1) 设计实现了一个基于层次短语的统计机器翻译引擎。以汉英翻译为例,与普通的基于短语的机器翻译系统相比,该引擎的翻译质量有显著提升。尤其是在翻译需要长距离重排序的句子时获得了比较好的结果。 (2)提出了一种从双语对齐语料中获得层次短语的算法,该算法使用基于扫描线的方法,在对源语言训练句子的一次扫描过程当中得到层次短语规则,简单易于实现。而且实验表明该方法在计算时间方面具有良好的表现。 (3)基于层次短语的翻译系统面临的一个问题是训练得到的翻译规则的数量远超过普通的基于短语的翻译系统,从而导致了计算代价的攀升。本文探讨了基于层次短语的翻译系统中冗余规则的精简问题,提出了一种基于“重排序分割点”的约束方法,有效减少了系统中使用的规则数量,系统的训练时间和解码时间也随之大大减少。
Other AbstractStatistical approaches dominated machine translation research in the recent years, and statistical machine translation systems have evolved from word-based models to phrase-based models, and again to syntax-based models, incorporating more contextual and reordering information into the translation model in each step. The hierarchical phrase-based model is particularly popular for taking the advantages of both the conventional phrase-based model and the synchronous context-free grammar rules. We discusses the design and implementation of such a system in this thesis, and investigates possible ways to reduce the explosive redundancy of the system. The contributions of this work are summarized as follows: (1) We implemented a hierarchical phrase-based translation engine. On Chinese-English translation tasks, the translation quality of this engine is significantly higher than that from conventional phrase-based engines. Particularly this engine gives better translation when long-distance reordering is needed. (2) We proposed an algorithm for extracting sweep-line-based hierarchical rules from word-aligned bilingual corpus. The algorithm extracts all hierarchical rules in a scan of the source training sentence, which is easy to implement. And experiment showed its good performance. (3) We analyzed the rule redundancy problem in the hierarchical phrase-based translation model, and proposed a rift constraint that e_ectively reduced the number of rules in the system, and accordingly brought a dramatic reduction in the training and decoding time, with little sacri_ce of translation quality.
Other Identifier200528014628073
Document Type学位论文
Recommended Citation
GB/T 7714
方李成. 基于层次短语的统计机器翻译引擎的设计与实现[D]. 中国科学院自动化研究所. 中国科学院研究生院,2008.
Files in This Item:
File Name/Size DocType Version Access License
CASIA_20052801462807(1522KB) 暂不开放CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[方李成]'s Articles
Baidu academic
Similar articles in Baidu academic
[方李成]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[方李成]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.