基于跨语言知识迁移的多语言神经机器翻译方法研究

CASIA OpenIR > 毕业生 > 博士学位论文

	基于跨语言知识迁移的多语言神经机器翻译方法研究
	王迁
	2022-11-30
页数	130
学位类型	博士
中文摘要	近年来，端到端的神经机器翻译方法取得了令人瞩目的成就，相比于传统的统计机器翻译方法，在翻译质量方面有显著的提升。然而，神经机器翻译模型需要占用大量的计算资源和存储资源，为每个语言对部署单独的机器翻译模型会带来极大的资源消耗。多语言神经机器翻译使用单个翻译模型处理不同语言之间的翻译，可以显著降低神经机器翻译技术的应用成本。然而，现有的多语言神经机器翻译模型无法充分利用语言之间的通用翻译知识，且受到语言差异带来的负面影响，从而导致模型的翻译质量随着语言数目的增多而下降。因此，在多语言翻译模型中建模跨语言知识迁移，从而提高翻译知识在不同语言之间的共享程度，降低语言之间的负面影响，对改进多语言机器翻译性能、推动多语言机器翻译应用至关重要。本文围绕模型构建、训练及解码三个阶段，研究多语言机器翻译模型中的跨语言知识迁移方法，并充分利用不同语言之间的通用翻译知识及互补信息提升多语言机器翻译的译文质量。论文的主要工作和创新点归纳如下： 1. 提出了一种基于翻译任务聚类的多语言神经机器翻译方法多语言机器翻译通过构建一个共享的模型同时处理多个语言之间的翻译，但随着语言数量的增加，模型将面临更大的建模负担，进而影响翻译质量。针对上述问题，本文提出一种基于翻译任务聚类的多语言神经机器翻译模型构建方法。该方法将多语言翻译模型中不同任务之间的关系定义为任务亲和性，即一个翻译任务的优化对另一个翻译任务损失函数值的影响程度。方法将任务之间的亲和性作为距离度量以对不同翻译任务进行聚类，并为每个类别中的翻译任务构建多语言机器翻译模型，从而更充分地利用任务间的通用翻译知识，提高多语言机器翻译质量。实验表明，本文所提方法在一到多、多到一和多到多等不同的多语言翻译场景中均能显著提升译文质量。 2. 提出了一种基于参数分化的多语言神经机器翻译方法多语言神经机器翻译模型通常包含共享参数和语言独有参数，其中共享参数建模语言无关的通用翻译知识，语言独有参数建模语言相关的独有特征。如何在模型中设计参数共享策略，以平衡训练过程中语言独有知识和通用翻译知识的学习，是多语言神经机器翻译的一个核心问题。本文针对多语言机器翻译模型的训练过程提出一种基于参数分化的方法，使得模型在训练过程中动态地将共享参数转化为语言独有参数。该方法以不同翻译任务优化的梯度方向作为分化准则，并为梯度方向差异较大的任务赋予独有参数，而梯度方向相似的任务则保留共享参数。随着训练的进行，模型的参数共享策略得到持续优化，从而不断增强模型对通用翻译知识及语言独有知识的建模能力。实验表明，本文所提出的方法在同等参数规模下能够显著提高多语言神经机器翻译系统的翻译质量，同时其自动习得的参数共享策略与语言学特性、数据领域分布等先验特征高度相关。 3. 提出了一种基于同步交互生成的多语言神经机器翻译方法虽然多语言神经机器翻译模型具有处理多个语言之间翻译的能力，但在解码过程中同时只能进行单一目标语言的生成，这不仅限制了模型的解码效率，也阻碍了解码阶段不同目标语言之间的知识迁移。为此，本文提出一种基于同步交互生成的多语言机器翻译解码方法。给定一个源语言句子，该方法可以同步生成多个目标语言的译文，且在生成过程中同时利用多个目标语言已生成的译文信息，从而实现解码阶段多语言信息的交互。实验表明，本文所提出的方法在不同数据规模下均能取得更好的译文质量，且相比于串行生成的方式能够显著提高解码效率。综上所述，本文深入研究了多语言神经机器翻译中的跨语言知识迁移方法，并针对模型构建、训练和解码等不同阶段，提出了一系列利用通用翻译知识及语言间互补信息提高翻译质量的方法。实验表明，本文所提的方法能够显著提高多语言翻译的译文质量及翻译效率。
英文摘要	In recent years, end-to-end neural machine translation has achieved great success, with significant improvements in translation quality compared to traditional statistical machine translation methods. However, the neural machine translation model requires large computing and storage resources, and deploying an individual translation model for each language pair is resource-consuming. Multilingual machine translation handles the translation between multiple languages within a single translation model, which significantly reduces the cost of training and deploying of neural machine translation models. However, existing multilingual translation methods cannot fully exploit the general translation knowledge between different languages and suffer from language divergence, resulting in the drop of translation quality as the number of languages increased. Therefore, modeling cross-language knowledge transfer in multilingual neural machine translation to enhance the general translation knowledge sharing and alleviate the negative influence between different languages are crucial to improving the performance of multilingual machine translation and promoting the application of multilingual machine translation. This paper focuses on three phases of multilingual machine translation, namely model construction, training and decoding, to study the cross-lingual knowledge transfer, and exploit the general translation knowledge and complementary information between languages to improve the translation quality of multilingual machine translation. The main contributions of this paper are summarized as follows: 1. Translation Task Clustering based Multilingual Neural Machine Translation Multilingual machine translation handles translation between multiple languages with a shared model. However, as the number of languages increases, the model will face a greater modeling burden, which in turn affects the translation quality. To solve this problem, this paper proposes a multilingual machine translation model construction method based on translation tasks clustering. This method uses task affinity as clustering criterion, which is defined as the loss change of one translation task caused by the optimization of another task. Based on task affinity, this method then cluster all translation tasks and build separate multilingual translation models for each cluster, and each model handles the tasks in the corresponding cluster. In this way, the proposed method can fully exploit the general translation knowledge between tasks and improve the quality of multilingual machine translation. Experiments show that the proposed method can significantly improve the translation quality in one-to-many, many-to-one, and many-to-many multilingual translation scenarios. 2. Parameter Differentiation based Multilingual Neural Machine Translation Multilingual neural machine translation model contains shared parameters and language-specific parameters. The shared parameters model language-independent general translation knowledge, and the language-specific parameters are used to model language-specific features. How to design a parameter sharing strategy in the model to balance the learning of language-independent and general translation knowledge during training is a core issue in multilingual neural machine translation. This paper proposes a parameter differentiation based multilingual machine translation method, which enables the model to dynamically convert shared parameters into language-specific parameters during model training. The method uses the gradients of different translation tasks as the differentiation criterion, and the parameters with conflicting inter-task gradients are more likely to be language-specific. Experiments show that the method proposed in this paper can significantly improve the translation quality of multilingual neural machine translation over several strong baselines under the same parameter scale, and the automatically learned parameter sharing strategy well correlates with the linguistic proximities or domain similarities. 3. Synchronous Interactive Generation based Multilingual Neural Machine Translation Although the multilingual neural machine translation model has the ability to handle translation between multiple languages, it can only generate one target language at the same time during the decoding process, which not only limits the decoding efficiency of the model, but also hinders the positive knowledge transfer among target languages in the decoding stage. To solve this problem, this paper proposes a synchronous interaction inference method for multilingual machine translation. Given a sentence in source language, the method can generate translations in multiple target languages simultaneously, and exploit the relevant information of generated parts in multiple target languages to enable the cross-lingual information interaction during decoding. Experiments show that the method proposed in this paper can achieve better translation quality under different data scales, and can significantly improve the decoding efficiency compared to traditional asynchronous generation methods. In summary, this paper deeply studies the cross-lingual knowledge transfer methods in multilingual neural machine translation, and proposes to utilize general translation knowledge and complementary information in model construction, training and decoding to improve multilingual translation quality. Experiments show that the proposed methods can significantly improve the translation quality and efficiency of multilingual translation.
关键词	神经机器翻译翻译任务聚类参数分化同步交互解码
语种	中文
七大方向——子方向分类	自然语言处理
国重实验室规划方向分类	语音语言处理
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/50593
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	王迁. 基于跨语言知识迁移的多语言神经机器翻译方法研究[D],2022.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
论文电子版-签名.pdf（2123KB）	学位论文		限制开放	CC BY-NC-SA