鲁棒的口语翻译方法研究

CASIA OpenIR > 毕业生 > 硕士学位论文

	鲁棒的口语翻译方法研究
	王世宁
	2022-11-25
页数	72
学位类型	硕士
中文摘要	口语是人与人之间相互交流的重要表达形式。在信息全球化的今天，世界各国的人们在经济、外交、文化等各个领域的口语交流日益密切，口语翻译作为打破语言障碍的关键技术受到广泛研究。口语翻译旨在将源语言的语音或转录结果翻译成目标语言的语音或文本。与书面语翻译相比，口语中有很多不同的语言现象，并且口语文本通常来自于语音识别的结果，包含着识别错误和噪声。因此，如何建立针对口语特点的噪声鲁棒的翻译模型，对于口语翻译研究具有重要的意义。现有的口语翻译系统通常设计独立的降噪模块以降低口语现象和噪声数据对于翻译模型的影响，但会导致模型规模大、翻译时延高等问题。为此，本文设计了新颖的模型和方法，旨在通过端到端的方式提升口语翻译模型对于非规范语言现象（不流利现象）和识别错误两类噪声的鲁棒性。论文的主要工作和创新如下： 1. 提出了一种面向不流利语言现象的鲁棒口语翻译方法机器翻译方法对于新闻等书面语翻译可以取得较好的效果，但口语文本往往存在冗余、重复、修正、重述等不流利语言现象，与书面语相比有较大的差异，导致口语翻译的性能远不及书面语翻译。为此，本文提出了一种融合不流利检测的和口语翻译的多任务学习方法。该方法首先利用不流利检测技术识别句子中的不流利成分，继而通过额神经外的损失函数降低注意力机制对于不流利词语的关注程度，以增强编码器和解码器在不流利文本上的建模能力。此外，考虑到包含不流利现象的双语平行语料十分稀缺，本文提出了一种不流利文本合成方法以模拟真实口语的分布。在汉英翻译任务上的实验结果表明，所提方法可以有效提高不流利文本上的翻译质量。 2. 提出了一种面向语音识别错误的鲁棒口语翻译方法神经机器翻译系统极易受到输入噪声的干扰，特别是在口语翻译中，翻译模块的输入来自语音识别模块的输出，而后者不可避免地存在着识别错误，对翻译性能造成很大的影响。现有方法通常采用错误修正与翻译模块级联式的策略以减少识别错误造成的影响，但易导致系统时延增长，且可能引入额外的噪声。为此，本文提出了一种基于对比学习的噪声鲁棒的口语翻译方法，该方法将包含识别错误的样本作为正例，通过句子级别或词级别的对比损失，分别从整体和局部两种不同角度拉近含噪声文本与干净文本在表示空间中的距离，以降低识别错误对文本表示的影响。同时，本文还设计了多种精细化的错误合成方法以模拟真实的语音识别错误。在英汉双向多个数据集上的实验表明，本文所提出的方法可以有效降低语音识别错误对翻译性能的影响，从而提高模型的鲁棒性。综上所述，本文针对口语翻译的特点和存在的问题展开了深入研究，分别从不流利现象和识别错误两个方面出发，提出了基于多任务学习和对比学习的口语翻译方法以增强翻译模型对于噪声数据的鲁棒性。最终实验证实了本文所提出的方法能够有效提升口语翻译模型的翻译质量。
英文摘要	Spoken language is an important form of communication between people. In the era of information globalization, people from all over the world have increasingly close spoken communication in various fields such as economy, diplomacy, and culture. spoken translation has been widely studied as a key technology to break the language barrier. Spoken language translation aims to translate speech or transcription in the source language into speech or text in the target language. Compared with written language translation, there are many different linguistic phenomena in spoken language, and spoken text is usually derived from the results of speech recognition, which contains recognition errors and noise. Therefore, how to establish a robust translation model according to the characteristics of spoken language is of great significance to the study of spoken language translation. Existing spoken language translation systems usually design an independent noise reduction module to reduce the impact of spoken language phenomena and noise data on the translation model, but it will lead to problems such as large model scale and high translation delay. To this end, this paper designs novel models and methods, aiming to improve the robustness of spoken language translation models to ill-formed language phenomena (disfluencies) and recognition errors in an end-to-end manner. The main work and innovations of the paper are as follows: 1. A robust spoken language translation method for disfluencies The neural machine translation method can achieve good results in the translation of written languages, such as news, but the spoken texts often have redundancy, repetition, revision, repetition and other disfluent language phenomena, which are quite different from the written language, and they will lead to poor spoken translation performance. To this end, this paper proposes a multi-task learning method that fuses disfluency detection and spoken language translation. The method firstly uses the disfluency detection technology to identify the disfluencies in the sentence, and then reduces the attention mechanism's attention to the disfluent words through an additional loss function, so as to enhance the modeling ability of the encoder and decoder on the disfluent text. Furthermore, considering the scarcity of bilingual parallel corpora containing disfluent phenomena, this paper proposes a disfluent text synthesis method to simulate the distribution of real spoken languages. The experimental results on the Chinese-English translation task show that the proposed method can effectively improve the translation quality of disfluent texts. 2. A robust spoken language translation method for speech recognition errors Neural machine translation systems are easily disturbed by input noise, especially in spoken language translation, the input of the translation module comes from the output of the speech recognition module, which inevitably has recognition errors, which have a great impact on translation performance. Existing methods usually adopt the strategy of cascading error correction and translation modules to reduce the impact of recognition errors, but it is easy to increase the system delay and may introduce additional noise. To this end, this paper proposes a noise-robust spoken language translation method based on contrastive learning, which takes the samples containing recognition errors as positive examples, through sentence-level or word-level contrastive loss, respectively from global or local angle narrows the distance between noisy text and clean text in the representation space to reduce the impact of recognition errors on text representation. At the same time, this paper also designs a variety of refined error synthesis methods to simulate real speech recognition errors. Experiments on multiple English-Chinese bidirectional datasets show that the method proposed in this paper can effectively reduce the impact of speech recognition errors on translation performance, thereby improving the robustness of the model. To sum up, this paper conducts an in-depth study on the characteristics and existing problems of spoken translation, and proposes a multi-task learning and contrastive learning-based spoken translation method to enhance the translation model robustness to noisy data from two aspects: disfluencies and recognition errors. The final experiment confirms that the method proposed in this paper can effectively improve the translation quality of the spoken language translation model.
关键词	口语翻译鲁棒神经机器翻译不流利现象对比学习
语种	中文
七大方向——子方向分类	自然语言处理
国重实验室规划方向分类	语音语言处理
是否有论文关联数据集需要存交	否
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/50595
专题	毕业生_硕士学位论文
推荐引用方式 GB/T 7714	王世宁. 鲁棒的口语翻译方法研究[D],2022.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
201928014628062王世宁.p（3004KB）	学位论文		限制开放	CC BY-NC-SA