CASIA OpenIR  > 毕业生  > 硕士学位论文
基于Rough Sets的信息粒度计算及其应用
邵健
Subtype工学硕士
Thesis Advisor王珏
2000-06-01
Degree Grantor中国科学院自动化研究所
Place of Conferral中国科学院自动化研究所
Degree Discipline模式识别与智能系统
Keyword粒度 粗糙集 约简 差别矩阵 离散化 统计学习理论 支持向量机 Granularity Granule Rough Sets Reduction Discernibility Matrix Discretization Statistical Learning Theory Support Vector m
Abstract根据被学习的对象是否可以表示为关系型数据库形式,可以将机器学习分为 结构化机器学习与非结构化机器学习。根据数据的数学性质与对机器学习不同 的需求,又可以将结构化机器学习分为基于符号的机器学习与基于统计的机器 学习。对这两类机器学习,在九十年代初,来自东欧的两组数学家将他们在七、 八十年代从数学研究中发展的一些理论结果,引入机器学习的研究,对基于符 号的机器学习产生了Rough Sets(Rs)理论,对基于统计的机器学习产生了支持 向量机(Support Vector Machine,以下简称SVM)。 目前的研究结果表明,这两个理论可以作为上述两类机器学习算法设计的理 论基础,换句话说,这两个理论可以在不增加计算复杂性的条件下,分别描述 上述两种机器学习已有的主要算法。由于这两个理论有坚实的数学基础,因此 分别“规范”了这两类机器学习,大大减少了算法设计的随意性,并且使已有 的各种机器学习算法之间的比较有了理论基础。 随着信息产业及计算机网络的飞速发展,为了适应“人”对数据的理解,更 好地为“人”服务,海量数据的知识提取和信息发现急需一种快速有效的计算 方法。Rough Sets(即粗糙集)是由波兰科学家Pawlak等提出的一套数学理论。 它适于研究不完整数据、不精确知识的表达、学习、归纳等方法。它不同于传 统的概率统计和模糊集合论,而从知识的全局性角度考虑信息的不确定性。同 时它提供了一种对数据库进行约简的有效算法。粗糙集理论不仅为信息科学和 认知科学提供了新的科学逻辑和研究方法,而且为智能信息处理提供了有效的 处理手段。 当我们从不同的层次观察事物的时候,我们是把信息系统、知识系统划分 了粒度层次的,这是“人”所具有的特点。那么机器能不能也从不同的信息粒 度层次对数据进行描述呢?由粗糙集理论所定义的粗糙度出发,我们可以把信 息系统划分为一个具有完备半序格结构的粒度层次。一个粒子既是对象由它们 之间的不可区分性、相似性、近似性或功能划分的一类集合。在规定误差范围 内使粒度变粗可使数据大大简化,得到信息系统简洁的表示形式。 本文分析了用Rough Sets作数据约简的有效性,分别以代数结构、逻辑系 统、拓扑空间三个方面对粒度进行形式化描述,分析B-代数CG(B)、B-逻辑L(B) 和拓扑空间(X,f,г)之间的关系,并且给出了约简、离散化等问题的粒度化描述 方法;通过对RS与商结构的分析,对Rough Set与Fuzzy Set在信息不确定性 上的不同基础作了
Other AbstractMachine learning can be divided into two kinds: structured and non-structured, according to whether the learning objects can be represented as relative database. The structured learning can also be divided into two kinds, one is dealing with symbols and the other is based on statistics. In the early 1990s, two groups of mathematicians, using the results of mathematics research in the earlier years, developed two kinds of learning theories: Rough Sets theory, dealing with symbolic data, and Support Vector Machine, using statistical learning theory. The research work till now indicates that these two theories are the bases of the structured learning, i.e., they can depict the old algorithms without increasing the computational complexity. These two theories provide criterions to those two kinds of structured learning with solid mathematical foundation, which reduces the uncertainty of the algorithms' design and gives comparisons between algorithms. With the rapid increase of computer network and information, the information and knowledge discovery in mass data, especially in order of "human's understanding of data", needs a useful and fast computing technique. Rough Sets theory, a mathematical theory developed by Polish scientists, is very suitable in the learning of incomplete, imprecision data and uncertain knowledge. It is different from Fuzzy Sets and Statistics, provides a new scientific logic and research method. Moreover, the data reduction technique based on it provides effective means of intelligent information process. We are used to seeing things as several stratums. It is one of the human's characteristics to regard our knowledge systems as different granule stratums. We can now also use "granularity" in the knowledge representation in machines. Based on the "roughness" defined in Rough Sets Theory, We can partition an information system into several granularities, with a complete semi-order lattice. A "granule" is a clump of objects drawn together by indistinguishability, similarity, proximity or functionality. When we change an information system into a "coarser" granularity according to some limitations, we can get a much more concise form of this information system. This dissertation analyses the efficiency of data reduction using Rough Sets Theory, and describes the information granularity formalization method, using algebra, logic and topology. By analyzing the relationships among B-Alebra CG(B), B-logic L(B) and topology (X,f, г ), we give the granularity descriptions of reduction, discretization, etc. We contrast Fuzzy Sets theory with Rough Sets theory, give the difference between the two kinds of information uncertainty. Another field of this dissertation is in the Statistical Learning Theory and Support Vector machine, and their application in the discretization of continuous attributes.
shelfnumXWLW574
Other Identifier574
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/7307
Collection毕业生_硕士学位论文
Recommended Citation
GB/T 7714
邵健. 基于Rough Sets的信息粒度计算及其应用[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2000.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[邵健]'s Articles
Baidu academic
Similar articles in Baidu academic
[邵健]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[邵健]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.