How to make Information Retrieval (IR) system more precise is the key issue that all the researchers of related fields pay special attention to. Firstly, the thesis introduces the basic concept of IR and its evaluation, and then presents some methods to increase the precision of IR system as follows: 1.Gives a comparison of Bool Model (BM)、Vector Space Model (VSM)、Probability Model (PM) and Language Model (LM). According to our experiments, the precision of LM based IR system is higher than the others through adjusting smoothing parameters. And we can learn that: for a large test collection, the distribution of words is even, system performance can be improved through giving more weights to the Maximum Likelihood Estimation (MLE) of Document Model (DM). 2.Investigates the improvement of precision of IR through Query Expansion (QE). Two techniques are used, i.e. dictionary based QE and Relevance Feedback (RF) based QE. For the first one, expanding nouns、adjectives and adverbs respectively will increase the precision of IR system, but the precision will get worse after expanding verbs , Word Sense Disambiguation (WSD) can not eliminate the noises introduced from QE; For the latter one, a new term selection method is proposed and it is proven to be better than classic ones according to our experiments. However, the precision will decrease after RF if the initial retrieval result is not good enough. 3.Studies the effects of incorporating Natural Language Processing (NLP) technologies into IR system. Five term-relation-based models are introduced, i.e. Bigram Model、Word-Pair Model、Window-Based Model 、Dependence Model and Concept-Based Model. Compared with “bag of words” based model, all the five term-relation-based models will improve the performance of IR system . Among them, the Window-Based IR system with window size of 3 outperforms the others. 4.Simulates Query Model (QM) and Document Model (DM) by initializing the weights the query terms. The experiments show that DM can approach QM only in the cases that the test collection becomes much larger. 5.Introduces Page Level (PL) to the frame of LM based IR system. The experiments show that distinguishing page quality can improve the performance of IR system. Finally, the paper introduces the details about our participations in TREC2005 HARD track and 863 Information Retrieval Evaluation.
修改评论