As one part of machine learning, text classification has received special attention continuously. However, as one kind of special text, technical text classification only got little attention. At the same time, the necessity for technical text classification increases steadily. Considering above situation, we proposed study on environmental technical text classification. This thesis contains: Taken environmental technical text classification as examples, we constructed a database of technical text samples and the corresponding dictionary. All the samples in the database come from the real technical literature. We proposed study from three different points of views. In which, When treating the problem as a multi-class classification problem, we used the algorithm of. Learning with Local and Global Consistency, and proposed modification based on the character of technical text classification. When treating the problem as a two-class classification problem, we proposed a hiberarchy classification model for the first time. The results showed that the model could enhance the accuracy of the categorization stably and efficiently. ~hen treating the problem as a One-class classification problem, we proposed an algorithm combining local linear with One-Class. ~e introduced local linear to find the manifold of the text samples and defined the interface of positive and negative samples. Compared with standard SVM and SVM with One-Class, this algorithm has the advantages of high precision, simple parameter estimation, easy controlling of precision, and low computation time. This algorithm gives an effective way for the solution of text classification.
修改评论