The research of innovative ideology and methodology in automation discipline aims to give a systematic analysis of the factors which play important roles in the development of domestic automation discipline. It also aims to explore the relationship among those factors to build a knowledge system, whose ultimate goal is to develop a network platform offering knowledge services to potential users. Factors that include research objects, researchers, institutions, methods, theories, tools, etc. are so vital to the knowledge system that it is of great prominence to retrieve them precisely. This paper designs and implements a text mining system focusing on information extraction. The main contributions are summarized as follows: ① The application of text categorization and feature word selection technique in data cleaning. The vector space model approach is implemented to predict articles’ categories. A feature selection method named chifit is proposed, which can achieve higher precision with lower feature dimension. ② A method that reduces the problem of semantic clustering to morphological similarity computation is proposed to resolve keywords clustering. ③ A novel scheme “knowledge pedigree” is proposed and implemented to facilitate users in literature research and knowledge understanding. ④ A divisive clustering approach is used for person-institution alignment. This method is very similar to constructing a binary tree in a level-order traverse. ⑤ In order to evaluate the scholars’ academic influence precisely, a clustering approach based on graph is presented for person name disambiguation. ⑥ An unsupervised institution name normalization method is proposed, fully exploring the institution data within each person entity. Key Words: text mining, text classification, text clustering, name entity recognition and disambiguation, knowledge service
修改评论