基于可分性分析的分类方法与系统

CASIA OpenIR > 毕业生 > 博士学位论文

	基于可分性分析的分类方法与系统
其他题名	Classification Approach and System based on Discrimination Analysis
	惠康华
	2010-05-29
学位类型	工学博士
中文摘要	特征提取和分类器设计是模式识别系统中两个重要的环节。高维数据一方面会导致维数灾难的出现，使得已有的分类方法难以应付；另一方面人们无法直观的感知和理解这些高维数据。为了实现对数据的识别分类，就要对高维数据进行可分性分析，得到最能体现分类目的的低维特征。传统的特征提取研究主要集中在线性降维方面。近年来，针对基于流形学习的一类非线性降维方法的研究受到了越来越多的关注，并取得了重大进展。然而，该领域还存在很多理论与技术问题需要解决。本文对降维方法和分类器设计方法进行了深入的研究，涉及到一类非线性降维方法和基于稀疏表示的分类方法中几个根本问题，包括近邻参数的选择、“本质”维数估计、时间复杂度的降低、局部稀疏表示、可分性分析等。主要工作和贡献如下：第一，针对局部线性嵌入算法（LLE）中计算样本近邻及矩阵特征向量时间复杂度高的问题，提出一种基于聚类的局部线性嵌入方法。该方法通过结合聚类算法和LLE算法，不但可以有效地降低原有LLE算法的时间复杂度，同时还可以保持原始高维数据在低维嵌入空间的“内在”结构。此外，在对低维嵌入空间的样本数据进行分类的过程中，本文提出的方法可以获得与LLE相当甚至更好的分类性能。第二，LLE一类算法缺乏有效的近邻参数选择方法，而近邻参数的选择对其降维效果起着决定性的作用。针对此问题，本文提出了一种基于自适应近邻参数的局部线性嵌入方法，该方法通过寻找近似的局部线性块来确定局部线性嵌入的近邻参数。在几个标准数据集上的实验结果表明，相比LLE，本文提出的方法取得了较好的结果。第三，LLE算法存在两个问题：其一，高维空间中距离比较远的样本，在低维嵌入空间的距离可能不再很远。其二，无法估计流形的“本质”维数。针对这两个问题，本文提出了基于全局保持的局部线性嵌入方法。一方面，该方法在保持高维数据局部近邻关系的同时，可以使相互距离远的数据在低维嵌入空间距离仍然比较远，从而使得低维嵌入数据能够保持高维数据的可分性；另一方面，该方法可以估计流形的“本质”维数。实验结果显示，在“本质”低维空间，相比LLE，本文提出的方法可以实现更好的分类效果。第四，在基于稀疏表示的分类方法（SRC）基础上，本文提出了一种基于局部稀疏表示的分类方法。虽然SRC算法在多个数据集上可以达到目前最好的分类结果，但是该方法需要满足两个前提假设：其一，每类训练样本分布在一个线性子空间里；其二，每个类别的训练样本张成的线性子空间之间不能有交集或者距离太近。从这两个假设可以发现，SRC无法有效的处理非线性数据的分类问题。本文提出的方法可以有效的解决上述问题，其主要特点如下：（1）确定多个具有判别性的局部字典；（2）依据这些局部字典的线性重构误差实现对样本的分类；（3）该方法适合于真实的，没有限制条件的数据集；（4）该方法尤其适用于采用LLE一类降维方法进行降维后的数据集分类问题；（5）时间复杂度比较低。实验结果表明，本文提出的分类方法能够得到与其它常用分类器相当甚至更好的分类结果。
英文摘要	Feature extraction and classifier design are the most important parts in a pattern recognition system. When dealing with high dimensional data, one is faced with the “curse of dimensionality”, where classifiers will be invalid. At the same time, people can't apperceive and understand these high dimensional data intuitively. For the sake of classification, one needs to analyze the discrimination of the high dimensional data and gets the low dimensional features in favor of classification purpose. Traditional research on feature extraction mainly focuses on linear dimensionality reduction. In recent years, nonlinear dimensionality reduction methods based on manifold learning have received a great deal of attentions. However, there still exist many challenges in theories and techniques in this area. In this thesis, we study the methods of dimensionality reduction and classifier design which involve a lot of basic problems in nonlinear dimensionality reduction and sparse representation based classification, such as the selection of neighborhood parameter, the estimation of “intrinsic” dimensionality, the reduction of time cost, locally sparse representation, discrimination analysis, etc. The main contributions of this thesis include following issues: First, to the problem of high time costs of locally linear embedding (LLE) in finding neighbors and computing eigenvectors, a new method called clustering-based locally linear embedding (CLLE) is proposed. Depending on the combination of cluster and LLE, the proposed method not only reduces the time costs of LLE efficiently, but also preserves the “intrinsic” structure of the high dimensional data when embedded into a low dimensional space. Moreover, when classifying the embedded data, CLLE can receive comparative or even better results than LLE. Second, those LLE-like methods which reduce dimensionality based on neighborhood preserving, lack efficient ways to select neighborhood parameter. For this problem, a method called self-regulation of neighborhood parameter for locally linear embedding (Self-regulated LLE) is introduced. It seeks to solve the problem LLE encountered by finding the local patch which is close to be a linear one. The experiment results show that Self-regulated LLE performs better than LLE in most cases based on different evaluation criteria and spends less time on several data sets. Third, the methods like LLE preserving the neighborhood of high dimensional data, may confront tw...
关键词	特征提取降维流形学习稀疏表示可分性分析 Feature Extraction Dimensionality Reduction Manifold Learning Sparse Representation Discrimination Analysis
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6258
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	惠康华. 基于可分性分析的分类方法与系统[D]. 中国科学院自动化研究所. 中国科学院研究生院,2010.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20071801462803（4096KB）			限制开放	CC BY-NC-SA