流形学习与基于稀疏化的半监督分类相关方法研究

CASIA OpenIR > 毕业生 > 博士学位论文

	流形学习与基于稀疏化的半监督分类相关方法研究
其他题名	Research on Manifold Learning and Sparsity based Semi-Supervised Classification Methods
	古楠楠
	2012-06-01
学位类型	工学博士
中文摘要	数据的特征提取（或数据降维）及分类是数据建模与数据挖掘的基本问题，也是模式识别的关键与核心所在，而流形学习与半监督分类是近年来所兴起的数据特征提取（特别是低维特征表示）与分类中引人瞩目的热点方法。流形学习与半监督分类是缓解“高维少监督”问题的有效方法，具有十分重要的理论研究价值和实际应用价值。首先，它们涉及数学、计算机科学、信息科学、生物认知等多个领域，是新兴的前沿性的交叉科学研究。其次，流形学习与半监督分类促进了数学中多个领域的交叉。最后，流形学习与半监督分类在机器学习、数据挖掘以及模式识别等领域都有重要的应用。尽管流形学习与半监督分类在理论和应用上都取得了成功，但仍面临很多挑战性的问题，我们针对其中的一些关键性问题进行了深入的研究，取得了一系列创新成果： 1、针对流形学习方法的统一理解与综述问题，我们提出使用流形正则化框架。为了获得从高维表示空间到低维本质空间的降维映射，该框架力图拟合先验的低维表示指导信息，同时考虑降维映射的函数复杂度及其保持数据结构化信息的程度。依据此框架，我们将线性的与非线性的、无监督的与有监督的、单类的与多类的各种流形学习算法联系起来，从一个统一的角度理解它们，同时从此角度对它们进行了综述，并进一步探讨了它们之间的共性与差异。 2、针对处理半监督分类任务时的流形学习问题，即面对半监督分类时传统流形学习算法无法处理多流形数据、很难引入类别标签、缺乏显性映射的问题，我们采取“根据有标签数据的类别标签随机生成先验低维表示并对其进行拟合，同时保持数据稀疏结构”的策略，提出一种判别性保稀疏投影算法（Discriminative Sparsity Preserving Projection，DSPP）。DSPP是针对多流形数据设计的，具有显性的降维映射，还从数据稀疏表示中继承了较高的判别性能。实验证明，相比其它流形学习方法，DSPP在半监督分类方面有明显优势。 3、针对处理半监督分类任务时的流形学习问题，我们采取另外一种策略，即“多流形建模、非线性特征提取以及半监督判别分析”，从而提出一种多流形判别分析算法（Multi-Manifold Discriminative Analysis，Multi-MDA）。Multi-MDA可构造能够有效地捕获多流形结构的邻域图，通过此图将数据点用其对应的蕴含着非线性结构信息的特征向量表示，进而针对构造的特征向量引入类别信息并建立线性的降维映射。实验结果也表明了所提的Multi-MDA在半监督分类任务中具有很好的判别性能。 4、针对鲁棒流形学习问题，即流形学习方法对奇异点不鲁棒的问题，我们以Isomap这一流形学习经典方法为切入点，提出了一种基于L1-范数距离度量的算法：Isomap-L1。在有奇异点存在的情况下，L1-范数已被证明比L2-范数具有更好的性能，因此我们基于L1-范数距离度量的策略具有更好的鲁棒性。实验结果也证明了我们所提的Isomap-L1对奇异点具有较好的鲁棒性。 5、针对半监督分类的模型假设问题，即半监督分类所基于的聚类假设与流形假设并非对所有数据集都适用的问题，我们提出了基于稀疏化假设的核稀疏正则化（Kernel-based Sparse Regularization，K...
英文摘要	Feature extraction (i.e., dimensionality reduction) and classification are two fundamental issues in data modeling and data mining, and they are also the key parts of pattern recognition. Currently, manifold learning and semi-supervised classification are two hottest issues in feature extraction and classification. Manifold learning and semi-supervised classification are effective methods that can alleviate the problem of low labeled sample data size in high-dimensional space, and deserve deep research in both theory and application. First, manifold learning and semi-supervised classification are interdisciplinary and frontier fields of mathematics, computer science, information science and cognitive science. Second, manifold learning and semi-supervised classification have advanced the intersection of several disciplines of mathematics. Finally, manifold learning and semi-supervised classification have many important applications in machine learning, data mining and pattern recognition. Although manifold learning and semi-supervised classification have achieved great success in both theory and application, there are still many unaddressed issues. In this thesis, we focus on several key problems, and make a series of achievements, as introduced in the following: 1. For the unified interpretation and overview problem of manifold learning algorithms, we propose to utilize the Manifold Regularization (MR) framework. In order to gain the dimensionality reduction mapping f, MR tries to maintain the prior low-dimensional representation, and meanwhile, considers the complexity of f and the ability of f that can preserve certain intrinsic structure of data. With this framework, we connect various manifold learning methods from linear to nonlinear, unsupervised to supervised, single class to multi-class approaches. We utilize the MR framework to give a unified perspective to interpret them, present an overview of them, and investigate the common properties and intrinsic differences among them. 2. Most traditional manifold learning algorithms have three limitations when applied to semi-supervised classification: they are not applicable for data on multiple manifolds; it is difficult to introduce class labels; there is no explicit dimensionality reduction mapping. To address these problems, we adopt the strategy of "fitting the prior low-dimensional representation random generated according to the class labels of labeled points, and meanwhile preserving the spa...
关键词	降维流形学习半监督学习半监督分类稀疏化机器学习 Dimensionality Reduction Manifold Learning Semi-supervised Learning Semi-supervised Classification Sparse Representation Machine Learning
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6473
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	古楠楠. 流形学习与基于稀疏化的半监督分类相关方法研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2012.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20091801462802（2693KB）			暂不开放	CC BY-NC-SA