With the advance of the information age, data with huge number, high dimension and complex structure increases very quickly. Such an issue challenges the existing methods of machine learning, which cannot discover the nonlinear structure hidden in the data. In 2000, three articles published on Science magazine studied this issue from the perspectives of neuroscience and computer science, respectively. In this dissertation, manifold learning is studied from the perspectives of dimension reduction, classification and semi-supervised regression. The main contributions of this dissertation are as follows: (1) The necessity and rationality of manifold learning are explained. Then the concept of local manifold learning is defined and the related work is surveyed. In particular, manifold learning is investigated from the viewpoint of local learning. (2) An improved algorithm called partitional local tangent space alignment (PLTSA) is presented. PLTSA is better than VQPCA in that it gives the global coordinates of the data. It works on a much smaller optimization matrix than that of LTSA and leads to a better-scaled algorithm. The algorithm also provides a set of transformations that allow calculating the global embedded coordinates of the newcome data. Experiments illustrate the validity of this algorithm. (3) Based on the representation theorem of the local tangent space, a nonparametric method for computing the tangent distance is proposed. It computes tangent vectors directly from training data and needs no prior knowledge. -nearest neighbor classifiers based on this tangent distance is implemented, which combines local manifold learning and classification problems. Experimental results show that its performance is better than that of SVM classifiers on some handwritten digit datasets. (4) On the basis of manifold regularization, a framework of the semi-supervised regression with a class of generative loss function is deduced, which is called Laplacian regression. The algorithms of Laplacian regression with linear -insensitive, quadric -insensitive and Huber loss function are given. And some experiments on synthetic and real-world data sets are carried out.
修改评论