英文摘要 | Information Retrieval, including information orgnizing, representation, inquiry, access, etc, supplies a series of technologies for obtaining information rapidly and accurately. Information retrieval systems, usually focusing on text retrieval, find relevant documents based on users'queries. There are two key technologies involved, "Indexing" and "Similarity Computing". The traditional retrieval methods based on the keyword matching often result in low precisions. With the development of information society and World Wide Web, the traditional retrieval methods can no longer satisfy users' requirement. Nowadays, inrelegent retrieval has already become a hot-spot of research and will be a key technology in the next generation of World Wide Web. In most cases, the contents of the text are represented by nature language. The key challenge of intelligent retrieval is the natural language understanding, which means to find out the meaning behide the text. We believe that semantic models based on words are suitable for representing the shallow meaning of the text. Therefore, we use word-based semantic models to supervise the process of information retrieval, in order to get improved retrieval performance. Firstly the thesis gives a brief introduction of the background of information retrieval and semantic models. Then, three kinds of semantic models (Latent Semantic Indexing series, Semantic Tree and Semantic Tensor) are proposed and evaluated in the field of information retrieval. After that, we present the NLPR IR System, including the architecture, module definitions and implementation. TREC evalutions are used to evaluate the system. In summary, the contributions of the thesis are as follows. (1) It explains how to use word-based semantic models to supervise the process of information retrieval; (2) Based on the former LSI and PLSI, weakly-supervised probabilistic latent semantic indexing (SPLSI) is presented and evaluated, which can get more reasonable semantic space and can be used in the process of indexing; (3) Semantic Tree Model (STM) is developed to, create a dynamic, flexible, controllaoe and real-time semantic space. As a new technology of indexing, STM outperforms most of the existing methods; (4) Semantic Tensor is put forward as a new theory, which is expressed by two key notions. Three Window-based Models of this theory are developed to compute the similarities between documents and queries. The experiments show that They outperform the traditional word-based vector space models; (5) We build NLPR IR System, including architecture designing, module definitions and implementation; (5) We participate in 2003 TREC Evaluation (Robust Track and Novelty Track) in order to test NLPR IR system and get excellent results in Novelty Track. |
修改评论