|Place of Conferral||北京|
|Keyword||机器翻译 未登录词 自注意力神经网络 不流畅检测 标点恢复|
Cross-language communication and information transmission is a necessary foundation for the academic frontier of the research and the enterprise's sustainable development. Therefore, the aim of speech translation research is to assist single language consumer to achieve low cost, fast, and high quality cross-language barrier-free communication. Speech-to-speech translation consists of three parts, which are automatic speech recognition, machine translation and text to speech. In recent years, the oral-oriented machine translation is the difficulty and hotspot of the speech translation research. At present, Automatic Speech Recognizer typically produces a sequence of words without punctuation and contains ellipsis, repetition or even confusion, which causes the non-canonical sentences appear in speech translation. And also rare words out of vocabulary is the difficulty of the speech translation, which causes the ambiguity in machine translation. Therefore, the performance of a speech translation system is determined by the fault tolerance, comprehension ability and self-adaptive ability of translation models for automatic speech recognition output.
In this thesis, we focus on the research of machine translation and oral standardization, taking the normal text and the automatic
speech recognition text as research objects. In order to improve of the speech translation quality, we mainly concentrate on the following three problems, i.e. rare words out of vocabulary in machine translation, the punctuation restoration in automatic speech recognition, disfluency detection in automatic speech recognition.
Firstly, a class-specific based copy neural network is proposed to relieve the OOV problem in machine translation. OOV is a difficult problem in machine translation all the time. we design a set of named entity categories, including common named entities and number phrases, and collect named entity corpus in machine translation. In order to release the model from the stress of "understanding" the rare words, copy mechanism has been proposed to deal with the rare and unseen words for the neural network models using attention. However the negative side of the copy mechanism is that the model is only able to decide whether to copy or not. It is unable to detect which class should the rare word be copied to, such as person, location, and organization. This paper deeply investigates this limitation of the NMT model. As a result, we propose a new NMT model by novelly incorporating a class-specific copy network. With the network, the proposed NMT model is able to decide which class the words in the target belong to and which class in the source should be copied to. Experimental results on Chinese-English translation tasks show that the proposed model outperforms the traditional NMT model with a large margin especially for sentences containing the rare words..
Secondly, a hybrid attention for character-level neural network framework is proposed for machine translation, which can blend the character attention with the inner combined word attention elegantly. In our work, we use the bidirectional Gated Recurrent Units network (GRU) to compose word-level information from the input sequence of characters automatically. Contrary to traditional NMT models, two kinds of different attentions are incorporated into our proposed model: One is the character-level attention which pays attention to the original input characters; The other is the word-level attention which pays attention to the automatically composed words. With the two attentions, the model is able to encode the information from the character level and word level at the same time. We find that the composed word-level information is compatible and complementary to the original input character-level information. The experimental results show that the proposed method has an obvious advantage in terms of machine translation.
Thirdly, a multi-task based self-attention neural network is proposed for punctuation restoration tasks. Traditional methods built on the sequence labelling framework are weak in handling the joint punctuation. To tackle this problem, we propose a novel multi-task network, which can solve the aforementioned problem very well. The key difference between our proposed model and the vanilla self-attention network is the last output layer of the decoder, where we use two softmax layers to predict label sequence and word sequence separately. We conduct extensive experiments on complex punctuation tasks. The experimental results show that the proposed model achieves significant improvements on joint punctuation task while being superior to traditional methods on simple punctuation task as well. In addition, this paper applies the punctuation restoration in house machine translation, which could improve performance obviously.
Finally, a semi-supervised model based on multi-task self-attention and weight sharing is proposed for disfluency detection tasks. In this work, we views the disfluency detection as a translation task. In the multi-task self-attention neural network, the word sequence information and labelling information are incorporated into the model at the same time, and a constrained decoding method in testing phase is applied. Experimental results show that the proposed model achieves significant improvements than the strong baseline models. We are among the first endeavors to use the translation model to handle the disfluency detection task, which can be applied to any other sequence labelling task easily. In addition, we utilize the unlabeled corpus to enhance the performance by introducing the weight sharing strategy and the generative adversarial training to enforce the similar distribution between the labeled and unlabeled data. Experimental results show that the semi-supervised model can further improve the performance significantly translation.
|First Author Affilication||Institute of Automation, Chinese Academy of Sciences|
|王峰. 基于神经网络的语音翻译关键技术研究[D]. 北京. 中国科学院研究生院,2018.|
|Files in This Item:|
|template.pdf（2820KB）||学位论文||暂不开放||CC BY-NC-SA||Application Full Text|
|Recommend this item|
|Export to Endnote|
|Similar articles in Google Scholar|
|Similar articles in Baidu academic|
|Similar articles in Bing Scholar|
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.