With the rapid development of the Internet, a tremendous amount of information is increasing everyday. How to gain useful information from huge of e-information is an urgent task to handle. Text information holds a very important station in all e-information. The research on text retrieval and text categorization has great value both in theory and reality. In this article, we focus on the research and exploration of the text retrieval and automatic text categorization: Firstly, we concentrate on the text retrieval. There are many models for Chinese text retrieval: Boolean indexing, vector space model (VSM) based on statistics, probabilistic retrieval, retrieval based on semantic network and so on. After analysis these models, this paper explores text retrieval with the conceptual network as a tool. How to organize domain knowledge with conceptual network and how to uses the domain knowledge in text retrieval are explored in this part. Secondly, we do research on automatic text categorization. Now, most of the text categorization systems are based on the VSM, that means the text is expressed in a vector, then which class the text belongs to is determined by the distance between the vectors. As the VSM does not take the relationship between the features into account, the result is not so precisely as some times. Aimed at this instance, the text categorization algorithm based on knowledge tree is proposed in this article. It simulates the human behavior in the text classification and uses the knowledge tree as the basis to categorize the text. During the process of computing the association degree between the text and the class, it considers the structure of the text and makes dynamic weighting to the key words. The experiments show that this algorithm has better recall than KNN algorithm that based on VSM. At the same time, the experiments show that we can get better results if the knowledge tree is more consummated.
修改评论