基于Rough Sets的信息粒度计算及其应用

CASIA OpenIR > 毕业生 > 硕士学位论文

	基于Rough Sets的信息粒度计算及其应用
	邵健
	2000-06-01
学位类型	工学硕士
中文摘要	根据被学习的对象是否可以表示为关系型数据库形式，可以将机器学习分为结构化机器学习与非结构化机器学习。根据数据的数学性质与对机器学习不同的需求，又可以将结构化机器学习分为基于符号的机器学习与基于统计的机器学习。对这两类机器学习，在九十年代初，来自东欧的两组数学家将他们在七、八十年代从数学研究中发展的一些理论结果，引入机器学习的研究，对基于符号的机器学习产生了Rough Sets(Rs)理论，对基于统计的机器学习产生了支持向量机(Support Vector Machine，以下简称SVM)。目前的研究结果表明，这两个理论可以作为上述两类机器学习算法设计的理论基础，换句话说，这两个理论可以在不增加计算复杂性的条件下，分别描述上述两种机器学习已有的主要算法。由于这两个理论有坚实的数学基础，因此分别“规范”了这两类机器学习，大大减少了算法设计的随意性，并且使已有的各种机器学习算法之间的比较有了理论基础。随着信息产业及计算机网络的飞速发展，为了适应“人”对数据的理解，更好地为“人”服务，海量数据的知识提取和信息发现急需一种快速有效的计算方法。Rough Sets(即粗糙集)是由波兰科学家Pawlak等提出的一套数学理论。它适于研究不完整数据、不精确知识的表达、学习、归纳等方法。它不同于传统的概率统计和模糊集合论，而从知识的全局性角度考虑信息的不确定性。同时它提供了一种对数据库进行约简的有效算法。粗糙集理论不仅为信息科学和认知科学提供了新的科学逻辑和研究方法，而且为智能信息处理提供了有效的处理手段。当我们从不同的层次观察事物的时候，我们是把信息系统、知识系统划分了粒度层次的，这是“人”所具有的特点。那么机器能不能也从不同的信息粒度层次对数据进行描述呢?由粗糙集理论所定义的粗糙度出发，我们可以把信息系统划分为一个具有完备半序格结构的粒度层次。一个粒子既是对象由它们之间的不可区分性、相似性、近似性或功能划分的一类集合。在规定误差范围内使粒度变粗可使数据大大简化，得到信息系统简洁的表示形式。本文分析了用Rough Sets作数据约简的有效性，分别以代数结构、逻辑系统、拓扑空间三个方面对粒度进行形式化描述，分析B-代数CG(B)、B-逻辑L(B) 和拓扑空间(X，f，г)之间的关系，并且给出了约简、离散化等问题的粒度化描述方法；通过对RS与商结构的分析，对Rough Set与Fuzzy Set在信息不确定性上的不同基础作了
英文摘要	Machine learning can be divided into two kinds: structured and non-structured, according to whether the learning objects can be represented as relative database. The structured learning can also be divided into two kinds, one is dealing with symbols and the other is based on statistics. In the early 1990s, two groups of mathematicians, using the results of mathematics research in the earlier years, developed two kinds of learning theories: Rough Sets theory, dealing with symbolic data, and Support Vector Machine, using statistical learning theory. The research work till now indicates that these two theories are the bases of the structured learning, i.e., they can depict the old algorithms without increasing the computational complexity. These two theories provide criterions to those two kinds of structured learning with solid mathematical foundation, which reduces the uncertainty of the algorithms' design and gives comparisons between algorithms. With the rapid increase of computer network and information, the information and knowledge discovery in mass data, especially in order of "human's understanding of data", needs a useful and fast computing technique. Rough Sets theory, a mathematical theory developed by Polish scientists, is very suitable in the learning of incomplete, imprecision data and uncertain knowledge. It is different from Fuzzy Sets and Statistics, provides a new scientific logic and research method. Moreover, the data reduction technique based on it provides effective means of intelligent information process. We are used to seeing things as several stratums. It is one of the human's characteristics to regard our knowledge systems as different granule stratums. We can now also use "granularity" in the knowledge representation in machines. Based on the "roughness" defined in Rough Sets Theory, We can partition an information system into several granularities, with a complete semi-order lattice. A "granule" is a clump of objects drawn together by indistinguishability, similarity, proximity or functionality. When we change an information system into a "coarser" granularity according to some limitations, we can get a much more concise form of this information system. This dissertation analyses the efficiency of data reduction using Rough Sets Theory, and describes the information granularity formalization method, using algebra, logic and topology. By analyzing the relationships among B-Alebra CG(B), B-logic L(B) and topology (X,f, г ), we give the granularity descriptions of reduction, discretization, etc. We contrast Fuzzy Sets theory with Rough Sets theory, give the difference between the two kinds of information uncertainty. Another field of this dissertation is in the Statistical Learning Theory and Support Vector machine, and their application in the discretization of continuous attributes.
关键词	粒度粗糙集约简差别矩阵离散化统计学习理论支持向量机 Granularity Granule Rough Sets Reduction Discernibility Matrix Discretization Statistical Learning Theory Support Vector m
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/7307
专题	毕业生_硕士学位论文
推荐引用方式 GB/T 7714	邵健. 基于Rough Sets的信息粒度计算及其应用[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2000.