基于boosting思想的列表排序算法研究

CASIA OpenIR > 毕业生 > 硕士学位论文

	基于boosting思想的列表排序算法研究
其他题名	Study on Liswise Ranking Approach based on Boosting
	王永庆
	2011-05-29
学位类型	工程硕士
中文摘要	随着互联网技术的飞速发展和普及，使得网络上积聚了海量开源数据信息。这些海量信息同时带来信息获取上的极大困难。如何面向海量的网络信息，从中有效地发现有用的信息，成为了网络信息化时代的一项重大研究课题。因此，排序学习作为一种利用机器学习获得排序模型及其排序结果的自动排序技术，摆脱了长久以来依靠专家经验排序海量网页文本的技术限制，已成为一个十分重要的热点研究领域。针对以往排序学习方法存在的问题，本论文围绕列表排序方法开展工作。具体地，本文完成了以下工作： (1) 本文首先全面总结现有列表排序算法，分析其在实际搜索排序问题中的应用局限，提出了挖掘单个特征内隐含的序关系（隐序），通过组合这些隐序获得良好排序结果的可行性及方法优势。而后，本文利用 KNN 方法设计了一种优于简单弱排序模型的隐序挖掘模型——KNN 弱排序模型。 (2) 利用 boosting 的分布迭代思想，提出 FeatureRank 算法，用于组合弱排序结果以获得较强排序结果。FeatureRank 采用优化排序结果评价值的策略，学习获得组合弱排序结果的排序模型。由于 FeatureRank在学习过程中存在排序效果的振荡现象，本文提出了可以有效消除 FeatureRank振荡现象的BL-FeatureRank 算法。 (3) 提出DiffRank算法，该算法利用了排序位和序得分之间的转换可能性，将排序位的值转换为文本的序得分值，从而摆脱借助具体评价值优化排序模型的局限。同时，DiffRank是可预测误差上界的算法，因此其收敛速度快，在有限特征内学习效果明显。实验表明，DiffRank 在简单弱排序模型及 KNN 弱排序模型下均能获得较好的排序结果。 (4) 采用实验方法，实际验证本文所提出的三种列表排序算法的有效性；并与列表学习的代表性经典方法比较，实验结果表明本文所提出方法在性能上优于经典的列表排序算法，在某些排序评价值上具有明显的优越性。本论文提出的列表排序算法能够处理具有复杂序关系的训练集数据，并且在一定程度上克服以往线性排序模型的局限性，将排序学习方法拓展到解决复杂排序问题和实际搜索系统的应用。
英文摘要	With the rapid development of the Internet technologies, huge amount of open source data are available on the Web. These massive data have brought about great difficulty in information retrieval. To face with the difficulty, automatic methods to effectively discover useful information are in great demand. Learning to rank as a promising new method, which utilizes machine learning methods for the automatic knowledge discovery, has demonstrated its great potential in automating the retrieval and discovery process and become a hot research topic in recent years. Listwise approach is the most advanced learning to rank method that overcomes the limitations of previous methods. In this thesis, we shall focus on the studies of listwise approach. The accomplishments of this work include the following aspects. First, we review the existing related work on listwise approaches and analyze their limitations on practical web search applications. To solve the problem, we put forward a method to get weak ranking results based on single features and then aggregate weak ranking lists to establish a strong ranking list. In addition, we propose a more efficient weak ranking method based on KNN to explore the “hidden order” implied in single features. Second, to specifically solve the problem of constructing strong ranking model from aggregating weak models, we propose the FeatureRank algorithm which is in- spired by the idea of boosting. FeatureRank employs the strategy of optimizing the measure value on ranking results in order to get the strong ranking model. To eliminate the “vibration” in the process of FeatureRank, we further propose an improved algo- rithm BL-FeatureRank to achieve better ranking model and results. Third, we propose another solution to constructing strong ranking model from aggregating weak models - DiffRank algorithm. DiffRank transfers the ranking order in ranking score in order to learn the ranking model. We theoretically prove that DiffRank has distinct advantage of bounded learning errors. The empirical experimental study also discovers that the performance of DiffRank is independent of the way of producing the hidden order. We conduct experiments to verify the effectiveness of each of our proposed algorithms, and compare the ranking results by our proposed algorithms with those by the state-of-the-art listwise approaches. The experimental results show that our algorithms demonstrate great improvement and ef...
关键词	排序学习列表排序方法 Boosting 隐序 Featurerank Bl-featurerank Diffrank Leanring To Rank Listwise Approach Boosting Hidden Order Featurerank Bl-featurerank Diffrank
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/7603
专题	毕业生_硕士学位论文
推荐引用方式 GB/T 7714	王永庆. 基于boosting思想的列表排序算法研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2011.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20082800902907（1797KB）			暂不开放	CC BY-NC-SA