汉语问答系统关键技术研究

CASIA OpenIR > 毕业生 > 博士学位论文

	汉语问答系统关键技术研究
其他题名	Research on the Key Technologies for Chinese Question Answering
	吴友政
	2006-06-12
学位类型	工学博士
中文摘要	互联网的迅猛发展和广泛普及，使人们可以方便地从网络上获得信息。但是网络信息的爆炸性增长，又把人们淹没在信息的海洋里，准确、快速地获得有价值信息的难度大大地增加了。问答系统的出现旨在提供更有力的信息获取工具，以应对信息爆炸带来的严重挑战。相对于英文问答技术研究的迅速发展以及实用英文问答系统的推出，从事中文问答技术研究的科研机构还不多，而且基本没有成型的中文问答系统问世。本文就是在这样的情况下针对汉语问答技术展开深入研究，主要工作包括： [1] 建立了一个具有一定规模并可扩充的汉语问答技术评测平台平台的语料规模约为1.8GB；测试集现包括7050个汉语提问句；打分标准主要是借鉴TREC的评分标准。 [2] 提出了汉语问答系统的提问分类体系及基于多特征的提问分类算法论文从新的角度提出了一种提问分类体系，即提问的技术分类和提问的语义分类，并在此基础上实现了基于多特征的支持向量机提问分类算法。 [3] 设计了基于多特征的汉语命名实体识别算法论文提出的基于多特征的汉语命名实体识别算法具有以下特点：① 强调大颗粒度特征(词性特征)和小颗粒度特征(词形特征)的结合；② 强调统计模型和专家知识的结合；③ 设计多个细分类的实体模型以准确刻画不同实体的内部特征。 [4] 提出了基于主题语言模型的汉语问答系统句子检索算法该算法利用问答系统中特有的提问分类信息(即提问的答案语义信息)对句子初检结果进行主题聚类，即：“一个句子多个主题”和“一个句子一个主题”两种算法，通过Aspect Model将句子所属的主题信息引入到语言模型中，从而获得对句子语言模型更精确的描述。 [5] 提出了基于无监督学习的问答模式抽取技术该技术可以避免有监督学习算法的不足，它无需用户提供<提问，答案>对作为训练集，只需用户提供每种提问类型两个或以上的提问实例，算法即可完成该类型提问的答案模式的学习。
英文摘要	With the rapid development and broad popularization of Internet, it’s convenient for people to get information from web. Meanwhile, the volatile increase of web information brings users into the ocean of information and it becomes very difficult to obtain the exact and correct information quickly. Question Answering (QA) aims at providing the more powerful information access tools to help users to overcome the problems of information overloading. Compared with the development of the technologies of English QA, researches on Chinese QA are still at its early stage. The thesis focuses on the research on the key technologies of Chinese Question Answering. The main contributions and novelties are summarized as follows. [1] Build an Evaluation Platform for Chinese Question Answering The evaluation platform is composed of the corpus as the primary source of answers (about 1.8GB from Internet), the testing questions set (7050 testing questions), and the evaluation metrics in terms of Mean Reciprocal Rank. [2] Present a Chinese Question Taxonomy and SVM Classifiers Based on Multiple Features The thesis presents a new question taxonomy from the views of semantic typology and methodological typology, and the SVM classification algorithm based on multiple features. [3] Propose a Chinese Named Entity Recognition Model Based on Multiple Features The thesis proposes a Chinese named entity recognition (NER) model based on multiple features. It differs from the most of the previous approaches mainly as follows. First, the proposed model integrates coarse particle feature (POS model) with fine particle feature (word model), so that it can make up the disadvantages of each other. Second, in order to reduce the searching space and improve the efficiency, heuristic human knowledge was introduced into statistical model. Third, several sub-models was designed to respectively describe three kinds of transliterated person name, single character and multi words location name, abbreviative and full organization name. [4] Present Topic-based Language Model for Sentence Retrieval in Chinese Question Answering The main idea is to make use of the peculiar characteristics in question answering scenario, that is, the semantic category of the expected answer, to conduct topic segmentation, then incorporate the information of the sentence topic into the standard language model. For the topic segmentation, we propose two approaches that are One-Sentence-One-Topic and One-Sentence-Multi-Topics respectively. [5] Present Unsupervised Answer Pattern Learning for Answer Extraction in Chinese Question Answering An unsupervised learning algorithm is presented for answer pattern learning which could resolve the disadvantages of supervised learning algorithm. Given two or more questions of one question type, the algorithm can learn answer patterns from internet via web search, topic segmentation, pattern extraction, vertical clustering and horizontal clustering, etc.
关键词	汉语问答系统问答评测命名实体识别句子检索问答模式抽取 Chinese Question Answering Evaluation Of Question Answering Chinese Named Entity Recognition Sentence Retrieval Answer Pattern Extraction
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/5948
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	吴友政. 汉语问答系统关键技术研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2006.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20021801460322（1941KB）			暂不开放	CC BY-NC-SA