With the rapidly increasing information on the Internet, a research has been a focus on improving the performance of an information retrieval (IR) system by Natural Language Processing (NLP). As a fundamental technique in IR system, correlation computation between texts has affected the retrieval results directly. However, the traditional method to compute correlation is to use keyword string match, which is helpless when it comes to solve complex problems about text correlation. Therefore, this paper will solve the problem about correlation between automation discipline papers based on semantic analysis. The main content of this paper is: 1.Build Keyword Networks First, I analyze the elements and structures of papers. Then I introduce the characteristics of the keywords and explicitly explained 5 characteristics, such as the first position, the term sequence, the value of TFIDF, the part of speech and length of documents. Besides, I also discusse 4 common ways of extracting keywords and decided to use the method based on statistics. At last, I define the keyword networks and put forward 3 kinds of nodes in keyword networks, such as “core-word” node, “leaf-word” node and “potential-word” node. 2.Explain the Structure of Knowledge Representation: HowNet and Conceptual Knowledge Tree In HowNet, the sememe is the unit of semantic meaning and the concept is made up of sememes. Each concept is expressed as Knowledge Representive Language; in Conceptual Knowledge Tree, we use the attributes, relations and behaviors to describe a concept. I use these two knowledge representation system mention above to analyze automation discipline words on the basis of semantics. 3.Analyze correlation between texts First, I use HowNet as semantic representation fundament to compute similarity between common words. I improve the algorithm of computing similarity between sememes and between concepts. When computing similarity between sememes, I take the height and density of sememe tree into consideration. When computing similarity between concepts, I solve the problem about the weight of sememes by classifying each condition. Second, I analyze the structure of automation discipline words and put forward an algorithm to determine automation discipline words’ semantic meaning with the help of Conceptual Knowledge Tree and then computed the similarity between automation discipline words. Finally, I presente the algorithm of computing correlation between papers: map papers...
修改评论