Spoken language is an important form of communication between people. In the era of information globalization, people from all over the world have increasingly close spoken communication in various fields such as economy, diplomacy, and culture. spoken translation has been widely studied as a key technology to break the language barrier. Spoken language translation aims to translate speech or transcription in the source language into speech or text in the target language. Compared with written language translation, there are many different linguistic phenomena in spoken language, and spoken text is usually derived from the results of speech recognition, which contains recognition errors and noise. Therefore, how to establish a robust translation model according to the characteristics of spoken language is of great significance to the study of spoken language translation. Existing spoken language translation systems usually design an independent noise reduction module to reduce the impact of spoken language phenomena and noise data on the translation model, but it will lead to problems such as large model scale and high translation delay. To this end, this paper designs novel models and methods, aiming to improve the robustness of spoken language translation models to ill-formed language phenomena (disfluencies) and recognition errors in an end-to-end manner. The main work and innovations of the paper are as follows:
1. A robust spoken language translation method for disfluencies
The neural machine translation method can achieve good results in the translation of written languages, such as news, but the spoken texts often have redundancy, repetition, revision, repetition and other disfluent language phenomena, which are quite different from the written language, and they will lead to poor spoken translation performance. To this end, this paper proposes a multi-task learning method that fuses disfluency detection and spoken language translation. The method firstly uses the disfluency detection technology to identify the disfluencies in the sentence, and then reduces the attention mechanism's attention to the disfluent words through an additional loss function, so as to enhance the modeling ability of the encoder and decoder on the disfluent text. Furthermore, considering the scarcity of bilingual parallel corpora containing disfluent phenomena, this paper proposes a disfluent text synthesis method to simulate the distribution of real spoken languages. The experimental results on the Chinese-English translation task show that the proposed method can effectively improve the translation quality of disfluent texts.
2. A robust spoken language translation method for speech recognition errors
Neural machine translation systems are easily disturbed by input noise, especially in spoken language translation, the input of the translation module comes from the output of the speech recognition module, which inevitably has recognition errors, which have a great impact on translation performance. Existing methods usually adopt the strategy of cascading error correction and translation modules to reduce the impact of recognition errors, but it is easy to increase the system delay and may introduce additional noise. To this end, this paper proposes a noise-robust spoken language translation method based on contrastive learning, which takes the samples containing recognition errors as positive examples, through sentence-level or word-level contrastive loss, respectively from global or local angle narrows the distance between noisy text and clean text in the representation space to reduce the impact of recognition errors on text representation. At the same time, this paper also designs a variety of refined error synthesis methods to simulate real speech recognition errors. Experiments on multiple English-Chinese bidirectional datasets show that the method proposed in this paper can effectively reduce the impact of speech recognition errors on translation performance, thereby improving the robustness of the model.
To sum up, this paper conducts an in-depth study on the characteristics and existing problems of spoken translation, and proposes a multi-task learning and contrastive learning-based spoken translation method to enhance the translation model robustness to noisy data from two aspects: disfluencies and recognition errors. The final experiment confirms that the method proposed in this paper can effectively improve the translation quality of the spoken language translation model.
|Keyword||口语翻译 鲁棒神经机器翻译 不流利现象 对比学习|
|Sub direction classification||自然语言处理|
|planning direction of the national heavy laboratory||语音语言处理|
|Files in This Item:||There are no files associated with this item.|
|Recommend this item|
|Export to Endnote|
|Similar articles in Google Scholar|
|Similar articles in Baidu academic|
|Similar articles in Bing Scholar|
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.