Syntactic parsing is one of the fundamental problems in Natural Language Processing (NLP). It eliminates the structural ambiguities in natural languages for advanced NLP tasks. Taking morphological analysis as foundation, parsing is a task of studying how words and phrases making up a sentence and their roles and relations in the sentence. As one of the syntactic formation, phrase structure parsing is the traditional effort. However, the superiority on natural language structure understanding and the significant meaning for semantic analysis and other purposes make dependency parsing gaining more and more attention in recent years. The thesis commits itself to find the way to increase the speed and accuracy of de-pendency parsing. The novelties and main contributions are summarized as follows: (1) A layer-by-layer dependency parser based on sequence labeling models is proposed. Graph-based models and transition-based models are two dominant data-driven paradigms in the dependency parsing community. The unit they calculate to find the op-timal structure is the whole sentence and a couple of words respectively, which implies that these two kinds of methods represent the two extremes for optimal structure search-ing. In this thesis, we adopt a moderate structure for parser modeling: a dependency layer. Inside the layer the dependency graphs are searched exhaustively while between the lay-ers the parser state transfers deterministically. Taking the dependency layer as the parsing unit, the proposed parser has a lower computational complexity than graph-based models and alleviates the error propagation that transition-based models suffer from. Furthermore, the parser adopts the sequence labeling models to find the optimal graph of the layer which demonstrates that the sequence labeling techniques are also competent for hierar-chical structure analysis. Layer-based framework, neighboring relation analysis mecha-nism and CRF-based labeling offer the proposed approach desirable accuracies and espe-cially a fast parsing speed, which will be quite helpful for large scale corpora analysis. (2) A two-pass dependency parsing approach for long Chinese sentences is presented. Sentence segmentation is one of the effective avenues to handle the problems in long sentence parsing. Traditional approaches use classified punctuations as the divider. But the poor classifying accuracy of the punctuations shackles the improvement of the final parsing performance. In the propose...
修改评论