CASIA OpenIR  > 毕业生  > 硕士学位论文
汉语句法分析方法研究
Alternative TitleApproaches to Syntactic Parsing of Chinese
李幸
Subtype工学硕士
Thesis Advisor宗成庆
2005-05-01
Degree Grantor中国科学院研究生院
Place of Conferral中国科学院自动化研究所
Degree Discipline模式识别与智能系统
Keyword句法分析 Natural Language Processing
Abstract句法分析是自然语言处理中的关键性问题之一,其主要任务就是自动识别句子的句法结构,即句子包含的句法单位以及这些句法单位相互之间的关系。句法分析问题的解决对于机器翻译、自然语言理解、信息抽取和自动文摘等自然语言处理系统都有着极其重要的意义。在基于统计的句法分析方法中,最关键的两个问题是句法分析算法和歧义消解模型的设计,他们决定着句法分析系统的效率和分析正确率。本文从事的工作则从这些方面入手,实现了一个高效的中文句法分析器,主要研究工作如下: 1. 在句法分析算法方面,对传统的句法分析算法从处理策略,算法的时间和空间复杂度等方面进行了综合分析和比较。并在此基础上,详细研究了Chart算法的一个改进算法——“角色反演算法”。针对该算法,本文在两方面提出进一步的改进。首先改进了算法中采用的静态数据表的构造方法,使得该算法能处理的原始输入词性标记从最小的句法单元——词,扩展到更高一级的句法单元——短语和句子,以很小的额外空间消耗为代价,提高了算法的处理能力和效率。然后,引入规则的概率信息对静态表排序,有利于后续分析的搜索和剪枝过程。 2. 针对复杂长句句法分析的困难,通过分析标点符号在长句构成上的作用和规律,针对长句提出了一种分层的句法分析方法。该方法把标点符号分为分割标点和普通标点两类,根据分割标点将复杂长句分割为句子单元序列独立进行第一级分析,然后把第一级分析得到的结果作为第二级分析的输入,最终输出结果为完整的句法分析树。另外,通过提取含有所有两类标点符号的文法规则,在一定程度上帮助了句法结构歧义的消解。实验证明该算法大大降低了长句分析的时间复杂度,并且比传统的一遍搜索方法的正确率和召回率均提高了7%。 3. 在歧义消解模型方面,在传统的概率上下文无关文法(PCFG)模型的基础上,提出了一个包含内部成分结构信息的PCFG模型,并进一步引入中心词信息,得到包含内部结构成分信息和中心词信息的词汇化PCFG模型。并且,本文提出了根据内部成分结构标记确定中心词的方法,此方法比传统的中心词确定方法具有更高的正确性和直观性。
Other AbstractThe main contributions aresummarized as follows: 1. In parsing algorithm, most traditional parsing algorithms are analyzed and compared mainly in the processing strategy, time consumption and space consumption. A “role inverse algorithm” which is an improved version of Chart parsing algorithm is studied detailedly. Based on this algorithm, this thesis proposed two aspects of improvement. Firstly, the static rule tables are extended, so that the original input of the algorithm can extend from words to phrases and sentences. In this way, the processing ability and efficiency of the algorithm will be improved. Secondly, the probabilities of grammar rules are used to sort the rule tables, which will avail the latter pruning. 2. In order to solve the difficulty of parsing long Chinese sentences, the usage and function of Chinese punctuations are studied in syntactic parsing and a hierarchical parsing approach is proposed. It differentiates from most of the previous approaches mainly in two aspects. Firstly, Chinese punctuations are classified as ‘divide’ punctuations and ‘ordinary’ ones. Long complex sentences which include the categories of ‘divided’ punctuations are broken into suitable units, so the parsing will be carried out in two stages. This ‘divide-and-rule’ strategy greatly reduces the difficulty of acquiring the boundaries of sub-sentences and syntactic structure of sub-sentences or phrases simultaneously in once-level parsing strategy of most of previous approaches. Secondly, a grammar rules system including all punctuations is built to be used in parsing and disambiguating sentences. Experiments show that our approach can significantly reduce the time consumption and numbers of ambiguous edges of traditional methods, and the accuracy and recall rate of traditional method is increased by 7% by our method, when parsing long complex sentences. 3. In parsing module for disambiguation, based on a classical probability context-free grammar (PCFG) module, the inner structure information is incorporated into PCFG module to form a new module. The head word information is further introduced into above module, then a lexical PCFG module which includes inner structure information and head word information is constructed. At last, this thesis proposed a new method to find the head words, which is simple and has greatly higher accuracy than most of other methods.
shelfnumXWLW883
Other Identifier200228014603551
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/6884
Collection毕业生_硕士学位论文
Recommended Citation
GB/T 7714
李幸. 汉语句法分析方法研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2005.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[李幸]'s Articles
Baidu academic
Similar articles in Baidu academic
[李幸]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[李幸]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.