With the rapid development and broad popularization of Internet, it’s convenient for people to get information from web. Meanwhile, the volatile increase of web information brings users into the ocean of information and it becomes very difficult to obtain the exact and correct information quickly. Question Answering (QA) aims at providing the more powerful information access tools to help users to overcome the problems of information overloading. Compared with the development of the technologies of English QA, researches on Chinese QA are still at its early stage. The thesis focuses on the research on the key technologies of Chinese Question Answering. The main contributions and novelties are summarized as follows. [1] Build an Evaluation Platform for Chinese Question Answering The evaluation platform is composed of the corpus as the primary source of answers (about 1.8GB from Internet), the testing questions set (7050 testing questions), and the evaluation metrics in terms of Mean Reciprocal Rank. [2] Present a Chinese Question Taxonomy and SVM Classifiers Based on Multiple Features The thesis presents a new question taxonomy from the views of semantic typology and methodological typology, and the SVM classification algorithm based on multiple features. [3] Propose a Chinese Named Entity Recognition Model Based on Multiple Features The thesis proposes a Chinese named entity recognition (NER) model based on multiple features. It differs from the most of the previous approaches mainly as follows. First, the proposed model integrates coarse particle feature (POS model) with fine particle feature (word model), so that it can make up the disadvantages of each other. Second, in order to reduce the searching space and improve the efficiency, heuristic human knowledge was introduced into statistical model. Third, several sub-models was designed to respectively describe three kinds of transliterated person name, single character and multi words location name, abbreviative and full organization name. [4] Present Topic-based Language Model for Sentence Retrieval in Chinese Question Answering The main idea is to make use of the peculiar characteristics in question answering scenario, that is, the semantic category of the expected answer, to conduct topic segmentation, then incorporate the information of the sentence topic into the standard language model. For the topic segmentation, we propose two approaches that are One-Sentence-One-Topic and One-Sentence-Multi-Topics respectively. [5] Present Unsupervised Answer Pattern Learning for Answer Extraction in Chinese Question Answering An unsupervised learning algorithm is presented for answer pattern learning which could resolve the disadvantages of supervised learning algorithm. Given two or more questions of one question type, the algorithm can learn answer patterns from internet via web search, topic segmentation, pattern extraction, vertical clustering and horizontal clustering, etc.
修改评论