Ranking problems can be categorized into two classes, ordinal regression and learning to rank. The former aims to predict samples of ordinal scale. The latter aims to predict a ranked list for a set of items. This paper proposes a framework for the ranking problem, where two ranking problems are defined in terms of machine learning language, respectively. Base on this framework, we investigate the theory of the ranking problem and propose some effective algorithms. Overall, the work and contributions are the following: 1. On the theory level of ordinal regression, we give a formal definition for the ordinal regression and propose a reduction framework which transforms the ordinal regression to the stand multi-classification. The reduction framework not only indicates the relation between the ordinal regression and the classification, but also leads to a new weighting mechanism to solve the ordinal regression. We also prove that the risk of the ordinal regression is bounded by that of the weighted training set. On the algorithm level of ordinal regression, we propose three effective algorithms: (1) an algorithm to optimize the risk of the weighted training set; (2) a new decision tree algorithm, i.e., Ranking Tree; (3) a multi-feature ordinal regression algorithm. 2. On the theory level of learning to rank, we propose a supervised learning definition for it. On this definition, we investigate statistical properties of the surrogate loss functions in terms of consistency and soundness, which provides a theoretical foundation for analyzing the strength and weakness of existing approaches to learning to rank, and for devising more advanced algorithms. On the algorithm level of learning to rank, we propose a novel surrogate loss function for learning to rank, which is convex, sound, statistically consistent, and computationally efficient. The corresponding optimization problem is not only easy to solve, but also has a better solution than others in terms of ranking accuracy. 3. In the practical application of learning to rank, we find that items on the top of a ranked list are more important than those on other positions and the data set is usually with partial order relation. Therefore, we propose a new top-k loss function and develop an iterative optimization algorithm which can not only learn the ranking function but also mine the "true" ranked lists from the partial constraints. Experiments on a benchmark data set validate the usefulness of the proposed algorithm.
修改评论