In many machine learning applications, labeled data are scarce because labeling data is both time-consuming and expensive. However, unlabeled data are very easy to collect in many applications such as text categorization and image classification. This has motivated machine learning researchers to develop learning methods that can exploit both labeled and unlabeled data during model learning. Such a learning paradigm developed over the past decade or so is referred to as semi-supervised learning. Although semi-supervised learning has been an active area of research, its use in real world application are still limited because most of existing methods are lack of scalability and robustness. This thesis proposes efficient and robust semi-supervised learning methods for real-world applications. Its main contributions are summarized as follows, We proposed a graph construction method with the aim of preserving local information of the data set. By formulating the task as a quadratic programming problem, the method learns the edges and weights simultaneously. Exploiting the sparsity of the graph, we further propose a more efficient cutting plane algorithm to solve the optimization problem. We proposed a transductive learning method based on adaptive graphs, which enhances the performance of graph-based transductive learning by learning the graph and label inference simultaneously. The underling idea is to improve the construction of graph by using label information. We gave a simple iterative algorithm to optimize the objective function. Each iteration contains two steps: first, we fix the label and learn a new graph; then, for the new graph, we update the prediction. We proposed an efficient and robust algorithm for graph-based transductive classification. After approximating a graph with a minimum spanning tree, we develop a linear-time algorithm to label the tree such that the cut size of the tree is minimized. In addition to its great scalability on large data, our proposed algorithm demonstrates high robustness and accuracy. We also prove the minimum spanning tree structure facilitates the graph parameter selection. We introduce into semi-supervised learning the classic low-dimensionality embedding assumption, and propose a semi-supervised learning algorithm based on this assumption. The motivation here is that we hope to find a low-dimensional representation of data so as to make the labeled data denser and therefore much easier for training...
修改评论