CASIA OpenIR  > 数字内容技术与服务研究中心  > 听觉模型与认知计算
神经机器翻译模型研究
杨振
Subtype博士
Thesis Advisor徐波
2019-05
Degree Grantor中国科学院大学
Place of Conferral中国科学院自动化研究所
Degree Name工学博士
Degree Discipline模式识别与智能系统
Keyword神经机器翻译 多语义向量 对抗网络 无监督机器翻译 通用训练框架 低资源
Abstract

基于神经网络的机器翻译模型(简称为神经机器翻译模型)经历了从朴素的编解码模型,到带注意力机制的编解码模型,再到完全基于卷积神经网络的翻译模型,最后到完全基于注意力机制的翻译模型的发展历程。在整个发展历程中,神经机器翻译模型的每一次革新,都带来了翻译性能的巨大提升。神经机器翻译系统的性能已经远远超越了传统的基于统计的机器翻译系统。虽然神经机器翻译取得了颠覆性的成功,但是依然存在诸如漏翻译、过翻译、稀疏词翻译准确率不高、一词多义以及对高质量双语语料的过度依赖等问题。而这些问题仍然需要通过改进神经机器翻译模型来得到有效解决。本文以深度学习技术为理论基础,对神经机器翻译模型进行了深入的研究与探索,主要研究成果如下:

1. 论文首次对以中文汉字作为建模单元的神经机器翻译模型进行了研究。本文提出了基于双向行卷积(Bidirectional Row Convolution)的构词方法,能够自动从汉字序列中提炼词语信息。本文提出的基于字符序列输入的翻译模型不依赖于词语的边界信息,大大缓解了集外词问题。与以词语为建模单元的基线系统相比,本文提出的方法在客观评价指标上取得了与基线系统相当的翻译性能。

2. 针对神经机器翻译中存在的一词多义现象,论文首次提出了一种基于多语义向量的神经机器翻译模型。本文在语义粒度上将词语包含的多个语义信息分别用不同的语义向量进行表示,编码时根据当前上下文环境,利用注意力机制选择适合当前语境的语义向量进行编码,提升了神经机器翻译模型对词语语义进行精确编码的能力。

3. 论文首次将对抗网络引入到神经机器翻译模型中,使得翻译模型直接学习如何输出与真实训练样本相似的句子。本文详细描述了教师监督(Teacher Forcing)、权重裁剪(Weight Decaying)、预训练(Pre-training)等能够保持对抗网络稳定高效训练的策略和技巧,仔细分析了不同优化准则对神经机器翻译模型性能的影响。在中英和英德两个翻译任务上都取得了超越~Transformer~模型的翻译性能。

4. 论文首次提出了一种基于权重共享的无监督神经机器翻译模型,该模型能够在没有任何双语语料的情况下进行模型训练,并取得了较好的翻译性能。本文在中英、英德、英法三个语言对,五个翻译方向上验证了基于权重共享的无监督神经机器翻译模型的有效性;同时,本文将多语言训练方法和无监督训练方法进行融合,提出了一种通用的神经机器翻译模型训练框架,该训练框架显著提升了机器翻译模型的性能。

Other Abstract

Neural Network based machine translation(NMT) has experienced from the naive encoder-decoder model, to the encoder-decoder model with attention, to the fully \\
convolution-based model, and to the attention-based model. Each innovation of the NMT model brings significant improvement of the translation performance. The NMT model has greatly surpassed the traditional statistical-based machine translation (SMT). However, the NMT still has some unsolved problems, such as repeated translation, missing some words during translation, low accuracy for rare words, ambiguous words and heavy dependency on large-scale bilingual data. It is widely acknowledged that the solution to remaining problems of NMT shall be based on the improvement of the NMT model. Based on deep learning, this work investigates deeply on NMT models. The main contributions are listed as follows:

1. This work investigates a novel character-aware neural machine translation  model that views
the input sequences as sequences of characters rather than words. On the use of row convolution, the encoder of the proposed model composes word-level information from the input sequences of characters automatically. Our model does not rely on the boundaries between each word and it alleviates the out-of-vocabulary problems in NMT. Experimental results show that the proposed character-aware NMT model can achieve comparable translation performance with the traditional word based NMT models. 

2. This work proposes a novel NMT model based on multi-sense embedding. Each sense type of the word is represented as a separate sense embedding. During encoding, attention mechanism is utilized to 
enable the NMT model to only focus on the relevant sense type of the input word in current context. Experimental results show that the side-effects of the ambiguous words have been greatly alleviated by the proposed multi-sense based NMT model. 

3. This paper proposes an approach for applying GANs to NMT, which trains the NMT model to directly learn how to output the samples which are unable to differentiate from the true samples. This paper detailed describe the training strategies which are utilized for stable training, such as teacher forcing, weight decaying and pre-training. Additionally, how does the training criterion affects the translation performance has been analyzed in this paper. Experimental results show that the proposed approach achieves better performance than the state-of-the-art Transformer on Chinese-English and English-German translation tasks.
 
4. This paper investigates NMT models in low-resource settings. This work proposes an unsupervised NMT model based on weight sharing. With this new approach, we achieve significant improvements on English-German, English-French and Chinese-to-English translation tasks. To train the NMT model with monolingual and parallel data at the same time, this paper fuses the multi-lingual training approaches with the unsupervised training approaches, and proposes a general training framework for NMT models. The general training framework significantly improves the performance of NMT models.

Subject Area计算机科学技术 ; 人工智能理论
MOST Discipline Catalogue工学::计算机科学与技术(可授工学、理学学位)
Pages114
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/23716
Collection数字内容技术与服务研究中心_听觉模型与认知计算
Recommended Citation
GB/T 7714
杨振. 神经机器翻译模型研究[D]. 中国科学院自动化研究所. 中国科学院大学,2019.
Files in This Item:
File Name/Size DocType Version Access License
博士毕业论文_杨振.pdf(3951KB)学位论文 开放获取CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[杨振]'s Articles
Baidu academic
Similar articles in Baidu academic
[杨振]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[杨振]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.