基于支持向量机的快速非线性分类算法

CASIA OpenIR > 毕业生 > 博士学位论文

	基于支持向量机的快速非线性分类算法
	毛雪
	2017-05-24
学位类型	工学博士
中文摘要	支持向量机（SVM）一直都是机器学习领域里的热点研究课题，在产业界也得到了广泛的应用。它建立在统计学习的VC维理论和结构风险最小理论的基础之上，泛化能力好，在很多任务中表现出优秀的性能。当遇到非线性问题的时候，我们常常训练一个 kernel SVM 分类器，它利用核技巧将原始低维空间映射到一个高维特征空间。虽然 kernel SVM 通常情况下能够取得比较好的分类结果，但是它的计算复杂度很高，尤其在处理大规模非线性数据集的时候更是如此。这是因为 kernel SVM 的复杂度依赖于产生的支持向量的数目，而支持向量的数目与训练集的大小大致成线性关系。另一方面，linear SVM 效率比较高，但是在处理非线性数据的时候，往往得不到满意的分类结果。因而研究基于 SVM 的快速非线性分类算法对于将 SVM 应用于大规模数据上具有重要的意义。本文提出了三种基于 SVM 的快速非线性分类算法，第四个工作将 SVM 的思想拓展到度量学习，使度量学习能够应用于大数据上。本文的主要研究内容与创新点可归纳如下：（1）我们提出了一种集成多个linear SVM的快速非线性分类方法。我们首先使用高斯混合模型对数据进行聚类，然后在每个聚类上训练一个linear SVM。我们使用一个产生式图模型将数据聚类和 linear SVM 的学习纳入到一个统一的框架下。同时，每个linear SVM 不是分开学习的，我们将每个linear SVM 的学习看作一个任务，然后使用多任务学习的方法同时学习所有的linear SVM。实验结果表明多任务学习能够有效地避免在单个任务学习时容易出现的过拟合现象。该算法的预测复杂度和linear SVM 的数目成正比，而kernel SVM的复杂度和支持向量的数目成正比，所以该算法在测试时，速度比kernel SVM 快很多。同时，我们将局部一致聚类引入到高斯混合模型的聚类过程中，从而在聚类的过程中，考虑局部流形结构，使得聚类效果更好。我们还将该算法推广到了半监督学习，从而同时使用有标签数据和无标签数据来估计高斯混合模型的参数，避免了高维数据下高斯混合模型协方差矩阵容易出现奇异的问题。（2）我们提出了一种基于局部软分配编码的局部线性分类器。该算法将局部软分配编码融入到 linear SVM 的决策函数中，从而形成了局部线性分类器的决策函数。另外，它采用完全监督的方法同时学习编码所用的锚点（anchor points）与局部线性分类器。通过将锚点与分类器变量纳入到一个统一的优化问题中，不断优化锚点和分类器，从而学习出更优的分类器。该算法的测试复杂度与锚点的数目成正比，通常锚点的数目比支持向量的数目少很多，所以该算法在测试时比kernel SVM 快很多。该算法的分类准确率和 kernel SVM 差不多。（3）在大数据上训练 kernel SVM，计算复杂度很高，而且需要大量内存。我们提出了一种通过识别间隔（margin）上的支持向量来高效训练 kernel SVM 的算法。它首先对训练数据无放回随机采样几次，每次只采样训练数据的一个子集，然后在每个子集上训练 kernel SVM，通过这些训练得到的 kernel SVM 分类器来识别 margin 上的支持向量。然后通过求解一个优化问题来估算与这些支持向量对应的拉格朗日乘子，这样就得到了决策函数，从而可以对未分类数据进行测试了。对于比较大的训练数据集来说，在训练数据的子集上多次训练 kernel SVM 比在整个训练集上训练 kernel SVM 快很多，所以此算法大大缩短了kernel SVM 在大数据集上的训练时间。而且该算法很容易做到并行化，从而训练速度可以进一步提升。（4）现有的度量学习方法通常涉及复杂的优化算法，很难应用到大数据上。我们提出了一种算法从排序学习的角度来进行度量学习，并根据 representer theorem 和上述 anchor points 的思想对该算法进行了加速，使之能够在大数据上进行度量学习。该算法与上述三个支持向量机加速算法之间的联系是该算法所使用的损失函数是排序的 hinge loss，相当于把支持向量机的思想扩展到了度量学习。另一个共同点是，该算法使用了 representer theorem 和上述 anchor points 的思想进行了加速，从而可以应用到大数据上，前面三个算法的核心思想也是对 SVM 进行加速，使之能够应用到大数据上。
英文摘要	The support vector machine (SVM) has always been a hot topic in machine learning community and has been widely used in industry. It builds on the VC dimension theory of statistical learning and structural risk minimization principle. It has good generalization ability, and achieves good performance in many tasks. When it comes to nonlinear classification problem, we need to train a kernel SVM classifier, which maps the original low-dimensional space to a high-dimensional feature space by kernel trick. Although kernel SVM can achieve high classification accuracy, it is computationally expensive, especially when dealing with large-scale nonlinear datasets. This is because the complexity of kernel SVM depends on the number of support vectors, and the number of support vectors is roughly linear with the size of the training set. On the other hand, although linear SVM is extremely efficient, it cannot handle nonlinear data with acceptable accuracy. Therefore, in order to scale SVM up to large datasets, it is very important to study SVM-based fast nonlinear classification algorithms. In this paper, three fast nonlinear classification algorithms based on SVM are proposed. In the fourth work, we extend the theory of SVM to metric learning so that metric learning can be applied to large data. The main contents and contributions of this thesis can be summarized as follows: (1) An efficient nonlinear classification method is proposed by combining multiple linear SVMs. The method employs a divide-and-conquer strategy which partitions the data into clusters using a Gaussian mixture model (GMM), and then trains a linear SVM for each cluster. Instead of being independent of each other, the two steps are combined into a generative graphical model. Meanwhile, each linear SVM is not separately trained. With training a linear SVM for each cluster considered as a single task, the training of these multiple linear SVMs can be regarded as a multi-task learning problem. Therefore, multi-task learning method is employed to learn these linear SVMs simultaneously. The experimental results show that multi-task learning method can effectively avoid the over-fitting phenomenon which is easy to occur when learning each single task. The prediction complexity of this algorithm is linear in the number of linear SVMs, while the prediction complexity of kernel SVM scales linearly with the number of support vectors. Hence, this algorithm is much faster than kernel SVM during the prediction phase. Besides, locally consistent clustering is incorporated into the clustering process of Gaussian mixture model so that the local manifold structure can be exploited to enhance the clustering performance. This algorithm is also extended to semi-supervised learning. Thus both the labeled data and the unlabeled data are used to estimate the parameters of the Gaussian mixture model. In this way, it avoids the problem that the covariance matrices of Gaussian mixture model are easy to be singular in high dimensional data. (2) A locally linear classifier based on localized soft-assignment coding is proposed. The algorithm integrates the localized soft-assignment coding into the decision function of the linear SVM, thus forming the decision function of the locally linear classifier. In addition, it employs a fully supervised method to learn both the anchor points used for the coding and the locally linear classifier. By optimizing the anchor points and the classifier variables under a unified framework, the anchor points and the classifier are optimized simultaneously to learn a better classifier. The prediction complexity of the algorithm is linear in the number of anchor points, which is usually much less than the number of support vectors. Therefore, the algorithm is much faster than kernel SVM during prediction. The algorithm achieves comparable classification accuracy to kernel SVM. (3) Training kernel SVM on large datasets suffers from high computational complexity and requires a large amount of memory. An efficient kernel SVM training algorithm is proposed by identifying the support vectors on the margin. In the first step, it randomly samples the training data without replacement several times, each time only a small subset of training data is sampled. Then a kernel SVM is trained on each subset, and the resulting kernel SVM models are used to identify the support vectors on the margin. In the second step, an optimization problem is solved to estimate the Lagrange multipliers corresponding to these support vectors. After obtaining the support vectors and Lagrange multipliers, we can approximate the decision function of kernel SVM, which can be used for classifying unseen data. Training many kernel SVMs on small subsets of training data is much more efficient than training a single kernel SVM on the whole training data especially for large datasets. So the algorithm greatly accelerates the training of kernel SVM on large data sets. Moreover, our algorithm can be easily parallelized for further speedup. (4) The existing metric learning methods usually involve complex optimization algorithms, which makes it difficult to apply these metric learning methods to large data. To address this issue, a ranking approach to metric learning is proposed. The algorithm is accelerated according to representer theorem and the above-mentioned anchor points approach, which makes it possible to apply the algorithm to large data. The relation between the algorithm and the above three SVM acceleration algorithms is that the loss function used by the algorithm is the ranking hinge loss, which is equivalent to extending the idea of SVM to metric learning. Furthermore, the algorithm is accelerated by using representer theorem and the idea of the above-mentioned anchor points approach so that it can be applied to large data. The main idea of the previous three algorithms is also to accelerate the SVM so that it can be applied to large data.
关键词	支持向量机机器学习非线性分类快速算法集成学习
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/14849
专题	毕业生_博士学位论文
作者单位	中国科学院自动化研究所
第一作者单位	中国科学院自动化研究所
推荐引用方式 GB/T 7714	毛雪. 基于支持向量机的快速非线性分类算法[D]. 北京. 中国科学院大学,2017.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
基于支持向量机的快速非线性分类算法.pd（3791KB）	学位论文		限制开放	CC BY-NC-SA