With the rapid development of the Internet resources, people pay more attention to the information. Therefor, it is very important that how to access and use information effectively. Text classification is the key technology of information retrieval and knowledge mining field. How to improve the classification efficiency has become the research focus in information retrieval. For text classification, how to improve the representation capability of features is the key of research. It takes the separate word as unit to establish vector space model. The words that are key to the documents content and the associational relations between words have not been realized. There are not so many studies on the relationship analysis between words recently. By proceeding with analysis on documents type features, we brought forward the subject of research on the relations between words, and analyzed it in two aspects: association and correlation. The work done in this paper includes: 1. Aiming at the shortage of the traditional feature selection, a new feature selection algorithm based on association rules and keywords is presented. This algorithm checked association rules by nonlinear correlation analysis to produce feature space which closely related to the category attribute. The experiment indicated that this method has a better categorization result than the traditional one. 2. Term correlation in the method of linear analysis and text classifier combining LLSF and KNN classifiers are proposed. Furthermore, a new voting method in KNN is designed. The experimental results showed that the new classifier achieved higher classification accuracy and efficiency.
修改评论