CASIA OpenIR  > 毕业生  > 博士学位论文
支持向量机分类及核聚类中的几何算法研究
其他题名A Study on Geometric Algorithms for Support Vector Machine and Kernel Clustering
常亮
2005-06-05
学位类型工学博士
中文摘要支持向量机(Support Vector Machine, 简称SVM)是机器学习中一种重要方法。该方法建立在统计学习理论的基础上,是结构风险最小化原则的具体实现。SVM集成了间隔最大化、核理论和正则化技术,已在实际应用中,如人脸检测、目标识别以及信息检索等领域显示出优良的性能;然而,对于数据量较大的情况,支持向量机方法存在运行速度缓慢的不足。这很大程度上制约了算法在大样本中的应用。因此,支持向量机算法的研究具有重要的理论意义和巨大的实用价值。由于SVM有明确的几何解释,使得几何算法在SVM这一特定的研究 领域显示出良好的研究前景。本文主要围绕支持向量机分类及核聚类中的几何算法研究展开了较为深入的工作。论文的主要内容如下: 1. 一种快速收敛的Gilbert算法。 Gilbert算法在很多情况下由于振荡引起收敛缓慢。针对振荡问题,我们提出一种改进的Gilbert算法。证明了:(1) Gilbert 算法的收敛点在振荡点的凸组合上;(2) 振荡点在凸包的支持超平面上。基于理论分析,给出一种改进的Gilbert算法。与原算法相比,改进算法具有快速收敛的优点。Gilbert算法在SVM的研究已有相关工作,如S. Martin将角度收敛的Gilbert算法用于分类。本文方法与S.Martin的方法在算法本质上是一致的。本文针对``振荡''问题进行了深入的理论研究,并取得了一定的初步成果。我们还给出改进的Gilbert算法与SVM的联系。 2. 基于凸包可见面的支持向量机数据约简方法。 支持向量机的增量学习中的一个关键问题是如何有效地进行数据约简。本工作提出一种基于凸包可见面的支持向量机数据约简方法。其具体思想是:每一步保留凸包可见面上的样本点,得到约简集。证明了在SVM分类中,舍去正负样本构成的凸包内部的样本点以及凸包非可见面上的样本点不影响分类正确率。设第 k 步正负样本构成的凸包非可见面上的样本点在第 k+1 步仍在非可见面上,则在SVM的增量学习中,保留正、负样本构成的凸包可见面上的样本点,去除其余点,分类性能不变。由于凸包可见面上的样本点不易直接求得,我们采用交替投影算法来近似求解。人工数据集和标准测试集的实验表明 本文方法具有良好的约简率。 3. 核向量机与支持向量机几何关系的研究。 核向量机是最新提出的一种大数据的快速分类算法。它将分类问题转化为一个最小包含球问题,由最小包含球近似算法快速求解。本工作中,我们分析了核向量机和传统支持向量机的几何关系,得到核向量机中最小包含球球面上的点与支持向量机最优分类面上的点的关系。几何关系的提出有助于对核向量机分类的理解。 4. 将近似最小包含球的思想引入核聚类中,得到基于核集合的快速核聚类算法。 已有核聚类算法收敛缓慢、计算效率低,难以得到实际应用。本工作中,我们通过计算几何中的近似最小包含球算法求解核聚类中的二次优化问题。得到的快速核聚类算法的时间复杂度与样本个数成线性关系。在人工数据集和标准测试集的实验表明,算法有较高的聚类正确率,并能快速收敛。算法能够有效地应用于图像分割中。
英文摘要Support Vector Machine(SVM) is an important tool in machine learning. SVM has shown good performances in applications such as face detection, object recognition and information retrieval.Current SVM algorithms become slow in dealing with large data sets. Therefore, exploration of efficient SVM algorithms is important in both theory and practice. Since geometric descriptions of SVM have been known, it is possible using geometric algorithms in SVM. This study is focused on the geometric algorithms for Support Vector Machine and kernel clustering. The main work is summarized as follows. 1. Gilbert algorithm is a very popular algorithm in colli-sion detection in robotics and also in classification in pattern recognition. However, the major drawback of Gilbert algorithm is that in many cases it becomes very slow as it approaches the final solution and the vertices selection vibrates in these cases. In this work, a) We prove theoretically that when the selection of vertices vibrates among several points, the algorithm will converge to the hyper-plane determined by these points. b)Based on the above results, an improved Gilbert algorithm for computing the distance between two convex polytopes is presented. The algorithm can avoid the slow convergence of the original one. Numerical simulation demonstrates the effectiveness and advantage of the improved algorithm. 2. We propose a simplified Support Vector Machine. The method is based on visible faces of the convex hull. We use an iterative projection algorithm in searching for the reduced vector set. Experimental results on real data sets as well as synthetic data sets indicate that the proposed method is effective while preserving good generalization performance. 3. Core Vector Machine (CVM) is an efficient kernel method for large data classification. It has prominent advantages in dealing with large data sets in high-dimensional space. In this work, we present a novel geometric framework between CVM and the traditional Support Vector Machine (SVM). It is believed that the obtained geometric relationship will be helpful in analyzing CVM. 4. Kernel Grower is a novel kernel clustering method proposed recently by Camastra and Verri (2005). It shows good performance for various data sets and compares favora-bly with respect to popular clustering algorithms. However, the main drawback of the method is the weak scaling ability in dealing with large data sets. In this work, we propose a scaled-up Kernel Grower method using core-sets, which is significantly faster than the original method for large data clustering. Meanwhile, it can deal with very large data sets. Numerical experiments on benchmark data sets as well as synthetic data sets show the efficiency of the proposed method. The method is also applied to real image segmentation to illustrate its performance.
关键词几何算法 支持向量机 核聚类 凸包最近点问题 最小包含球 Geometric Algorithms Support Vector Machine Kernel Clustering Nearest Point Problem Minimum Enclosing Ball
语种中文
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/5874
专题毕业生_博士学位论文
推荐引用方式
GB/T 7714
常亮. 支持向量机分类及核聚类中的几何算法研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2005.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
CASIA_20051801462801(5300KB) 暂不开放CC BY-NC-SA
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[常亮]的文章
百度学术
百度学术中相似的文章
[常亮]的文章
必应学术
必应学术中相似的文章
[常亮]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。