CASIA OpenIR  > 毕业生  > 博士学位论文
汉语词典义的语义理解研究
赵美静
2015-12
学位类型工学博士
中文摘要随着计算机应用技术的发展,语义信息处理成为当前备受关注的研究热点。然而,现有语义知识库普遍缺乏知识的形式化表达与计算的能力,这使得语义信息处理的进一步发展受到制约。为了推动形式化的基础语义知识库的构建,本文以现代汉语词典等当代常用词典为研究对象,研究了对汉语词典义(即词典的意义)的语义理解。基于概念知识树(Conceptual Knowledge Tree, CKT)知识表示模型,提出了一套对汉语词典义进行计算机存储、管理、分析和计算的可行方法,设计并构建了词典义的人工语义分析平台,并初步实现了一个词典义自动语义分析系统。本文主要包括四个方面的内容:
(1)为了实现语义知识的形式化表达,同时也为了给词典义自动语义分析方法提供理论基础,本文以CKT为知识表示方法,讨论了其在词典领域的语义界定,并对其进行了语义推理模型的理论总结和形式化计算理论的完善。通过研究复合概念与其成员概念在属性、关系和行为等语义要素上的关联关系,给出了复合概念的知识推理规则;通过分析父子概念间和整体部分概念间的属性继承关系,给出了概念的属性推理规则;借鉴形式语义学的计算模型,利用数理逻辑和Lambda演算,把独立概念、复合概念、属性、关系、知识树等语义元素作为基本谓词加入到语义的组合演算中,实现了真正意义上的语义形式化计算与推理。
(2)为了实现以词典数据为语料的语义知识库构建和语义分析研究,本文通过词典数据预处理,完成了从非结构化的原始词典文本数据到结构化的词典义中间数据的转换。文本基于CKT模型,把词典结构化数据转换成由概念符号、概念词语集合、概念定义、概念词性、示例、注音等信息描述的词典义初始数据,为词典义语义分析研究提供了简洁、规范的数据储备。在对词典义初始数据进行语义预处理时,我们还得到了丰富的概念属性知识和概念关系知识,这些语义知识将成为形式化基础语义知识库的重要知识储备。
(3)为了保证词典义中间数据语义分析的正确性,同时为词典义知识库的构建提供人机交互的接口,本文基于CKT表示模型,设计并实现了词典义的人工语义分析平台。该平台采用导航式和递归式的交互手段,通过概念管理、属性编辑、关系编辑、语义复合以及知识树构建等功能模块,逐步引导用户理解CKT的原理和组成,最终使用户通过人机交互界面轻松地实现对词典义的语义理解。
(4)为了加快词典义语义知识库的构建速度,同时为短文本自然语言理解提供一种自动语义分析方法,本文基于句法-语义相融合的思想,完成了基本结构词典义释义模式的语义分析,并提出了一种嵌套语义的复合算法;进一步根据CKT形式化表达理论与语义推理模型,提出了基于CKT的规则系统;利用基于CKT规则系统的词典义自动语义分析方法,对长度在15字符以内的词典义数据进行了实验,当字符数在7以内时,方法覆盖率达到84.94%,同时方法准确率达93.33%。实验效果表明,本章所提方法对短长度的词典义数据效果显着。
通过本文工作,我们得到了概念规模为76828的词典义语义知识库,其中已有30326个概念定义实现了自动语义理解,898个概念定义实现了手动语义理解,此外,知识库中还包含了大量概念属性知识和概念关系知识。本文提出的语义理解方法和实现的语义知识库致力于推动汉语语义信息处理的进一步发展。
英文摘要With the development of computer application technoloy, semantic information processing has been becoming a hot research area in Natural Language Processing. Existing semantic repositories generally lack formalized semantic representation model and semantic computing model, which restrict the further development of semantic information processing. In order to promote the construction of formalized basic semantic knowledge repository, we study the semantic understanding of The Meaning of Chinese Dictionary (MCD), and take commom dictionary as our research object. Based on Conceptual Knowledge Tree (CKT) Model, we propose a set of theory to store, manage, compute and analyze the semantic of MCD. We design and construct an artifitial semantic analysis platform and an automatic semantic analysis platform for MCD. Four aspects are mainly stduied in this paper:
 (1) To realize the formalized semantic representation and to provide theoretical basis for automatic semantic analysising of MCD, we take CKT as our knowledge representation method, study its semantic definition in dictionary, and then poprose the semantic reasoning model and formalized semantic computing theory. We learn the semantic relation between composit concept and member concept, parent concept and child concept, integer concept and member concept, and then put forward the semantic reasoning model of CKT. Referencing the computing model of Formal Semantics, we use Mathematical Logic and Lambda Calculus as computational model to represent the semantic of CKT.    
 (2) We present a data structure of dictionary interpretation based on CKT model. With the data conversion model, we can obtain structured data for the semantic understanding research of MCD. This model achieves the turning from unstructured original dictionary text to structured dictionary text by preprocessing dictionary data. And this model also gives a series of methods for cleaning data from dictionary data to MCD data (composed by concept symbols, concepts, definitions, concepts of sets of words, POS, phonetic information and etc). From the semantic pre-processing of initial dictionary data, we can also get variety of concept attribute knowledge and concept associated knowledge. This knowledge will become important knowledge reserve for formalizing basic semantic knowledge repository.
(3) We present a semantic analysis platform for dictionary interpretation based on CKT model for the purpose of guaranting the accuracy and completeness of dictionary semantic understanding. Interactive principles of navigation and recursion are introduced to lead users understanding each component and filling the related data conditioning with consistency check. Especially for complex semantic representation, navigative interaction can guide the composite semantics thought by users into correct semantic. This platform can also provide interactive interface for constructing formalized basic semantic knowledge repository.
(4) In order to speed up the construction of semantic repository, and in order to present a semantic analysis method for natural language understanding of short text, this paper presents the idea of syntax-semantic combination for dictionary semantic auto-analysis. By summarizing dictionary sentence structures and semantic features, we summarized reasoning rules of MCD. The semantic auto-analysis system for short-text interpretation in dictionary is implemented based on mathematical logic and Lambda calculus. We take experiments on the dictionary interpretation in which the number of characters is less than 15. The results show that our method performs well on short or middle length dictionary interpretation: when the number of characters is less than 7 (58.4% of the total number of cases), rules cover 84.94% of the number of mid-length interpretation, and the precision can reach 93.33%.
In summary, we build a semantic knowledge repository from dictionary that covers 76828 concepts, in which 30326 concepts are understanded by algorithm, and 898 concepts are understanded artificially. The semantic understanding method proposed in this paper and semantic knowledge repository constructed in this paper will promote the development of Chinese semantic information processing.
关键词汉语词典义 概念知识树知识表示模型 形式化 语义 句法
语种中文
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/11217
专题毕业生_博士学位论文
作者单位中国科学院自动化研究所
推荐引用方式
GB/T 7714
赵美静. 汉语词典义的语义理解研究[D]. 北京. 中国科学院研究生院,2015.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
赵美静博士学位论文_2011180146(3088KB)学位论文 暂不开放CC BY-NC-SA
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[赵美静]的文章
百度学术
百度学术中相似的文章
[赵美静]的文章
必应学术
必应学术中相似的文章
[赵美静]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。