Named entity ambiguity means that the same entity mention can refer to different entities in different context. It has brought very serious problems in information processing community, including machine translation, information extraction. Entity linking is an approach to resolve the named entity ambiguity problem. The task of entity linking sys-tem is to link an entity mention in a background souce document with the corresponding real world entity in an existing knowledge base. The research of entity linking system has a great academic and applied value in the field of knowledge engineering, information retrieval and natural language processing. This thesis focuses on the key problem of entity linking: the semantic similarity between the context of entity mention and the candidate entity. The main work and the contribu-tions of this thesis are summarized as follows: 1. A concept-based language model is proposed for entity linking task In order to overcome the problem of traditional BOW method and get a better semanitic relatedness measure between the context of entity mention and candidate entity, this the-sis propose a concept-based language model for entity linking. This language model represents both query and entity using Wikipedia concept instead of single word. The concepts used are taken from a very comprehensive, human-defined ontology, Wikipedia. We believe that by mapping the query and entity using high-level concepts will result in a model that is less dependent on the specific terms used in the query text and the docu-ment of entity. It could yield matches even when the same concept is described by dif-ferent terms in query and entity. To better capture the semantic knowledge from the structural information in Wikipedia, we develop two methods to estimate the concept language model for the entity. One is based on the link structure between the entity and the Wikipedia concept. The other is based on the category information of the entity. To evaluate the effectiveness of our proposed method, we conduct experiments on the stan-dard KBP datasets. Experimental results show that the proposed method can obtain a 6.1% improvement compared with the traditional word-based language model. Compared with the state-of-art approach, the propose method also get a 1.8% improvement. 2. A learning-to-rank framework is proposed for entity linking task In order to capture the structure information from Wikipedia to better estimate the seman-tic relate...
修改评论