Named entity disambiguation is one of the key techniques in information extraction and integration. It aims at resolving the name ambiguity problem which is common in the textual information, and plays an important role in many different areas, such as knowledge engineering, information retrieval and semantic web. However, the high-performance named entity disambiguation is critically depending on the use of semantic knowledge. In recent years, there is an increasing availability of large-scale knowledge sources on the Web. These knowledge sources, unfortunately, are usually heterogeneous and the semantic knowledge within them is encoded in complex structures, thus are difficult to be used in different tasks. Therefore, the mining and integration of the semantic knowledge contained in heterogeneous knowledge sources is critical to the named entity disambiguation and many other natural language processing tasks. This thesis focuses on semantic knowledge mining from the Web, and entity disambiguation and entity linking methods based on the mined semantic knowledge. The main contributions and novelties are summarized as follows. [1] Structural Knowledge Mining and Integration——Structural Semantic Relatedness Most semantic knowledge contained in structural knowledge sources can be represented as semantic relatedness between concepts. This paper proposes a novel structural semantic knowledge representation model——semantic-graph, which can uniformly represent the structural semantic knowledge exploited from multiple knowledge sources. Then we propose our Structural Semantic Relatedness measure to capture the explicit and implicit semantic knowledge contained in the semantic-graph. The experimental results show that our SSR method can significantly outperform the traditional BOW methods by 9.7% and the Social Network based methods by 15.7%. [2] Unstructured Knowledge Mining and Integration——the Entity Language Model The unstructured knowledge sources contain rich probabilistic semantic knowledge which can enhance the named entity disambiguation system. In this thesis, in order to mine and integrate the probabilistic semantic knowledge, we propose an knowledge representation model——Entity Language Model. Based on the entity language model, we demonstrate how to mine the semantic knowledge about an entity by exploiting unstructured knowledge sources. In order to resolve the sample sparseness problem in entity language model estimation, we propose two sam...
修改评论