流形学习若干问题研究

CASIA OpenIR > 毕业生 > 博士学位论文

	流形学习若干问题研究
其他题名	Research on Some Problems of Manifold Learning
	杨剑
	2006-05-27
学位类型	工学博士
中文摘要	信息时代的到来使具有大数据量、高维数、非结构化特点的数据信息以指数形式增长，迫切需要一种能够学习出数据中非线性结构的方法。2000年，在Science上发表的三篇文章分别从神经科学和计算机科学的角度对该问题进行了研究，由此产生的流形学习已成为当前机器学习研究的热点之一。本论文从维数约简、分类和半监督回归三个方面对流形学习进行了一些研究，主要工作包括以下几个方面：一．在综述部分，简单说明了流形学习的必要性及合理性，提出了局部流形学习的概念，对相关算法进行了比较全面的介绍，并试图从局部学习的角度重新审视流形学习问题。二．在维数约简方面，从局部流形学习的角度出发，提出了一种基于划分的局部切空间排列算法。它大大地减小了局部切空间排列算法中用于特征值分解的矩阵的大小，从而适用于更多的数据集，特别是那些样本空间维数不是很高但样本数较大的数据集。，且在存储少量数据的条件下，算法能够对新来的样本进行有效的处理，这是目前大部分流形学习算法所做不到的。并通过在模拟数据和真实数据集上的实验结果说明了该算法的有效性。三．在分类问题方面，把局部流形学习与分类相结合，在局部切空间表示定理的基础上，给出了一种计算局部切空间距离的非参数方法，以此距离为度量提出了用近邻法分类的局部切空间分类器算法，并用实验验证了算法的有效性，它在一些真实数据集上的分类效果优于支持向量机分类器。四．在半监督回归方面，从半监督流形学习的正则化框架出发，推导了基于一类广义损失函数的半监督回归—拉普拉斯回归。特别地，给出了线性－不敏感损失函数，二次－不敏感损失函数和Huber损失函数的半监督拉普拉斯回归算法，并对这些算法进行了一些模拟实验。
英文摘要	With the advance of the information age, data with huge number, high dimension and complex structure increases very quickly. Such an issue challenges the existing methods of machine learning, which cannot discover the nonlinear structure hidden in the data. In 2000, three articles published on Science magazine studied this issue from the perspectives of neuroscience and computer science, respectively. In this dissertation, manifold learning is studied from the perspectives of dimension reduction, classification and semi-supervised regression. The main contributions of this dissertation are as follows: (1) The necessity and rationality of manifold learning are explained. Then the concept of local manifold learning is defined and the related work is surveyed. In particular, manifold learning is investigated from the viewpoint of local learning. (2) An improved algorithm called partitional local tangent space alignment (PLTSA) is presented. PLTSA is better than VQPCA in that it gives the global coordinates of the data. It works on a much smaller optimization matrix than that of LTSA and leads to a better-scaled algorithm. The algorithm also provides a set of transformations that allow calculating the global embedded coordinates of the newcome data. Experiments illustrate the validity of this algorithm. (3) Based on the representation theorem of the local tangent space, a nonparametric method for computing the tangent distance is proposed. It computes tangent vectors directly from training data and needs no prior knowledge. -nearest neighbor classifiers based on this tangent distance is implemented, which combines local manifold learning and classification problems. Experimental results show that its performance is better than that of SVM classifiers on some handwritten digit datasets. (4) On the basis of manifold regularization, a framework of the semi-supervised regression with a class of generative loss function is deduced, which is called Laplacian regression. The algorithms of Laplacian regression with linear -insensitive, quadric -insensitive and Huber loss function are given. And some experiments on synthetic and real-world data sets are carried out.
关键词	流形学习维数约简局部切空间半监督回归 Manifold Learning Dimension Reduction Local Tangent Space Semi-supervised Regression
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/5907
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	杨剑. 流形学习若干问题研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2006.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20031801460303（2097KB）			暂不开放	CC BY-NC-SA