人机口语对话系统的知识自动生成技术

CASIA OpenIR > 毕业生 > 硕士学位论文

	人机口语对话系统的知识自动生成技术
其他题名	Automatic Knowledge Generation Technique of Human-Computer Dialog System
	黄韵竹
	2011-05-27
学位类型	工学硕士
中文摘要	人机口语对话技术使得人机交互更加简单自然。然而，要生成一个人机口语对话系统，需要耗费大量的人力物力。如何自动的搜集限定领域语言模型的训练语料以及构建人机口语对话系统的知识库，是当前的两个研究难点。本文针对这些问题，重点对日常对话聊天领域开展研究，提出了半自动扩展语言模型训练语料和构建口语对话知识库的方法。论文的主要内容和贡献如下： 1. 从词级扩展的层面，提出了一种词类扩展方法，并通过实验说明了该方法对语音识别系统的贡献。 2. 提出了一种半自动生成一阶谓词知识表示的方法。该方法利用了依存句法分析。首先对句子去停用词，然后对句子进行句法分析，再根据分析结果和关键词表将句子转换成一阶谓词形式，最后生成谓词知识库。实验表明，采用该方法生成的知识库具有很高的检出率。 3. 将词类的思想用在口语对话知识库上。根据句型将文本进行分类，同类句型只保留一句，其它以同类词的形式存入词类查询表，并且进一步进行词类扩展。采用该方法可以大大缩小知识库的规模，提高系统的处理速度。 4. 运用词类语料扩展和一阶谓词知识表示方法，改进了语音地球仪系统。
英文摘要	Human-computer dialog technology makes human-computer interaction more simple and natural. However, to generate a Human-computer dialog system a lot of manpower and resources are required. How to automatically collect training corpus of language modal in restricted domain and build knowledge base of Human-computer dialog system, are two challenges in current research. Focusing on the daily chatting area, we propose approaches of semi-automatic extension to the training corpus of language modal and building knowledge base for dialog. The main contents and contributions are as follows: 1. From the level of word-level expansion, a type of word class expansion methods is introduced. We illustrate the contribution of the method to the speech recognition system through experiments. 2. A semi-automatic method of generating first order predicate knowledge is proposed. The dependency parsing theory is used. We first get rid of stop words in the sentences, then analyze sentences with dependency parsing, next according to the parsing result and a key-word list convert the sentences into first order predicate logic form, finally generate the predicate logic knowledge base. Experimental results show that the method can reach the application level. 3. The thought of word classes applied on the knowledge base of the dialogue system. According to the sentence structure, we classify the text and maintain only one of the same structures. The other is saved into a list of word classes. Then the word classes are further expanded. This method can greatly reduce the size of the knowledge base, and improve processing speed. 4. Applying word class expansion and first order predicate logic knowledge representation methods, we improve the speech globe system.
关键词	人机口语对话系统词类扩展一阶谓词逻辑依存句法分析知识库生成 Human-computer Dialog System Word Class Expansion First Order Predicate Logic Dependency Parsing Knowledge Base Generation
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/7569
专题	毕业生_硕士学位论文
推荐引用方式 GB/T 7714	黄韵竹. 人机口语对话系统的知识自动生成技术[D]. 中国科学院自动化研究所. 中国科学院研究生院,2011.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20082801462803（691KB）			暂不开放	CC BY-NC-SA