CASIA OpenIR  > 毕业生  > 博士学位论文
基于依存关系的短语结构句法分析与词对齐方法研究及实现
Alternative TitleResearch and Implementation of Methods on Phrase Structure Parsing and Word Alignment Based on Dependency Relation
王志国
Subtype工学博士
Thesis Advisor宗成庆
2013-05-20
Degree Grantor中国科学院大学
Place of Conferral中国科学院自动化研究所
Degree Discipline计算机应用技术
Keyword词对齐 词图 短语结构句法分析 依存关系 依存连贯性 Word Alignment Lattice Phrase Structure Parsing Dependency Relation Dependency Cohesion
Abstract近年来,随着互联网上文本数据的急剧增长,如何利用自然语言处理技术高效地对海量数据进行处理,成为人们关注的焦点。句法分析作为自然语言处理领域的基础技术,是对文本进行深入理解的关键步骤,对其进行深入研究具有重要的理论意义和实用价值。此外,互联网数据中存在着大量的以不同语言表示的信息,人们对理解这些用非母语表达信息的愿望日益增长。双语词对齐技术作为自动获取翻译知识的关键环节,是人们突破语言障碍的重要手段。因此,本文的研究工作主要围绕短语结构句法分析和双语词对齐这两个任务展开。 传统的短语结构句法分析模型和词对齐模型都面临着两点重要缺陷:独立性假设太强,并且缺乏词汇信息的支撑。 依存关系描述了词语之间的支配与被支配关系,既蕴含着丰富的词汇信息,也包含了词与词之间的结构依赖关系。同时,依存关系本质上是用来描述语义关系的,而不同语言在语义层面是相通的,因此依存关系是跨越语言界限而客观存在的。根据上述依存关系的独特性质,如果能够合理地将其应用到短语结构句法分析和词对齐技术中,上述两个缺陷将会同时得到解决,这将非常有助于改善短语结构句法分析和词对齐的质量。基于此,本文对如何将依存关系融入到短语结构句法分析和双语词对齐的方法进行了深入研究和探索。论文的主要贡献和创新点归纳如下: 1、提出了一种利用依存结构引导短语结构句法分析的方法 通过对比汉英句法树库中的短语结构树和依存树,我们发现这两种句法结构之间存在两种映射关系:节点映射关系和推导规则映射关系。基于这一发现我们设计了一种新的CKY算法,利用依存树来指导短语结构树的生成。给定依存树的情况下,该算法根据节点映射关系确定待创建的短语标签位置,根据推导规则映射关系确定使用哪些短语标签创建当前的短语标签。在宾州英语树库和汉语树库上的实验结果显示:使用完全正确的依存树时,英语和汉语短语结构句法分析性能的F1值分别达到了96.08%和90.61%;使用MSTParser自动生成的N-best依存树时,英语和汉语短语结构句法分析的F1值分别达到了90.54%和83.93%,均超过了BerkeleyParser的结果。 2、提出了一种基于高阶依存关系的短语结构树重排序模型 通过进一步分析我们发现,上述方法将依存关系作为硬约束的方式对短语结构句法分析进行指导,一旦依存关系中存在错误,将直接影响短语结构句法分析的质量。针对这一缺陷,我们提出了使用高阶依存关系对短语结构树进行重排序的模型。该模型首先为待分析句子生成有约束的搜索空间(如N-best句法树列表或句法森林),然后在约束空间内抽取高阶依存关系特征,并利用该特征对短语结构树候选进行重排序,最终选择出最优的短语结构树。在宾州中文树库上的实验结果表明,该模型性能的最高F1值达到了85.74%,超过了目前宾州中文树库上取得的最好结果。另外,在短语结构树的基础上生成的依存树的准确率也有了大幅提升。 3、提出了一种联合处理汉语分词、词性标注和句法分析的词图框架模型 针对汉语分词、词性标注和句法分析的联合任务,如果直接将其按照传统的管道(pipeline)方式...
Other AbstractWith the rapid growth of the scale of text data on Internet, the task to efficiently parse the natural language text has become an important and hot research topic. Syntactic parsing, as one of the fundamental problems in NLP research area, has a great research value for it is an unbridgeable step to deeply understand text information. In addition, the text data on the Internet is often written with muti-languages, which makes the language barriers more severely. People are eager to understand various channels of information from different languages. Word alignment, as a key step of automatically extracting translation knowledge, inevitably becomes one of the research focus. Therefore, in this thesis, we will focus on two tasks: syntactic (phrase structure) parsing and word alignment. As we know, for the traditional phrase structure parsing and word alignment methods, there exist two major issues: one is the strong independence assumptions and the other is the lack of lexical information supporting. Dependency relation describes the binary asymmetrical head-modifier relation between words. It contains both abundant lexical information and structure dependency information. Furthermore, dependency relation is essentially used to describe the semantic relationship, which is objectively maintained and is language-independent. We believe that if we integrate the dependency relation into the procedures of phrase structure parsing and word alignment, those two issues would be overcomed at the same time. Therefore, this thesis is committed to the in-depth studies and explorations by focusing on how to make an effective integration of dependency relation with parsing and word alignment. The main contributions of this thesis can be summarized as follows: (1) A novel phrase structure parsing method is proposed with the help of dependency structure. Comparing phrase structure trees with dependency trees on both English and Chinese treebanks, we find that there is a general consistency between these two structures. Therefore, we especially design a novel CKY parsing algorithm to construct phrase structure tree with the guidance of dependency tree. Experimental results conducted on both English and Chinese treebanks show that: when we use the gold-standard dependency trees, the F1 scores for English and Chinese phrase parsing tasks are 96.08% and 90.61% respectively; when we use the N-best dependency trees generated automatically from the MSTParser, the F1 sco...
shelfnumXWLW1889
Other Identifier201018014629093
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/6503
Collection毕业生_博士学位论文
Recommended Citation
GB/T 7714
王志国. 基于依存关系的短语结构句法分析与词对齐方法研究及实现[D]. 中国科学院自动化研究所. 中国科学院大学,2013.
Files in This Item:
File Name/Size DocType Version Access License
CASIA_20101801462909(2464KB) 暂不开放CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[王志国]'s Articles
Baidu academic
Similar articles in Baidu academic
[王志国]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[王志国]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.