With the rapid growth of the scale of text data on Internet, the task to efficiently parse the natural language text has become an important and hot research topic. Syntactic parsing, as one of the fundamental problems in NLP research area, has a great research value for it is an unbridgeable step to deeply understand text information. In addition, the text data on the Internet is often written with muti-languages, which makes the language barriers more severely. People are eager to understand various channels of information from different languages. Word alignment, as a key step of automatically extracting translation knowledge, inevitably becomes one of the research focus. Therefore, in this thesis, we will focus on two tasks: syntactic (phrase structure) parsing and word alignment. As we know, for the traditional phrase structure parsing and word alignment methods, there exist two major issues: one is the strong independence assumptions and the other is the lack of lexical information supporting. Dependency relation describes the binary asymmetrical head-modifier relation between words. It contains both abundant lexical information and structure dependency information. Furthermore, dependency relation is essentially used to describe the semantic relationship, which is objectively maintained and is language-independent. We believe that if we integrate the dependency relation into the procedures of phrase structure parsing and word alignment, those two issues would be overcomed at the same time. Therefore, this thesis is committed to the in-depth studies and explorations by focusing on how to make an effective integration of dependency relation with parsing and word alignment. The main contributions of this thesis can be summarized as follows: (1) A novel phrase structure parsing method is proposed with the help of dependency structure. Comparing phrase structure trees with dependency trees on both English and Chinese treebanks, we find that there is a general consistency between these two structures. Therefore, we especially design a novel CKY parsing algorithm to construct phrase structure tree with the guidance of dependency tree. Experimental results conducted on both English and Chinese treebanks show that: when we use the gold-standard dependency trees, the F1 scores for English and Chinese phrase parsing tasks are 96.08% and 90.61% respectively; when we use the N-best dependency trees generated automatically from the MSTParser, the F1 sco...
修改评论