CASIA OpenIR  > 毕业生  > 硕士学位论文
Thesis Advisor张家俊
Degree Grantor中国科学院研究生院
Place of Conferral北京
Keyword词向量 偏置 词对齐 模式生成 模板
Other Abstract
Natural language exists in every aspect of people's lives and also is the most important carrier of information in human communication. Automatically using Natural Language Processing technologies to analyze language has become an urgent demand of intelligent era. Word embedding as a dense vector representation can express more potential features and more complete semantic information of words. It is one of the important bases of Natural Language Processing research, and its theoretical research is of great significance. Word alignment can identify the relationship between languages. It is an essential corpus for many natural language processing tasks such as machine translation and cross language information retrieval, So how to quickly generate high quality word alignment is particularly important. In this paper, we propose a new representation and training method to enhance the semantic representation ability of word embedding and leverage the semantic representation ability of word embedding to study word alignment. The main contents of this paper are summarized as follows:
(1) Propose a model to enchance word embeddings
At present, most of the word embedding learning methods are to train one word embedding for one word, which can not be used to express the polysemous words. The existing solutions are to solve this problem by training different word embeddings for different meanings of each polysemous word. This paper proposes a new method to bear on this problem. It differs from most of the related work in that it learns one semantic center embedding and one context bias instead of training multiple embeddings per word type. Experimental results on similarity task and analogy task show that the word representations learned by the proposed method outperform the competitive baselines.
(2) Propose one generate pattern model to study word alignment
Deep neural network has achieved good results in the study of word alignment, but its complex structure greatly reduces the training speed. This paper proposes one generate pattern model to study word alignment. The model extracts rules for word alignment through the thought of matrix transform and takes the techniques in convolution neural network as reference to construct model. With the guarantee of the strong information expression ability of word embedding, the aligned word block then could get highest score in the given parallel sentences. At the same time, the hidden layer of the neural network is removed by the proposed model, and so the computational complexity is greatly reduced. The experimental results show that the proposed method has achieved satisfactory results when we leverage it to construct a primary model.
Document Type学位论文
Recommended Citation
GB/T 7714
胡文鹏. 词向量的优化及其在词对齐中的应用[D]. 北京. 中国科学院研究生院,2017.
Files in This Item:
File Name/Size DocType Version Access License
词向量的优化及其在词对齐中的应用.pdf(1627KB)学位论文 暂不开放CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[胡文鹏]'s Articles
Baidu academic
Similar articles in Baidu academic
[胡文鹏]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[胡文鹏]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.