英文摘要 | Machine learning can be divided into two kinds: structured and non-structured, according to whether the learning objects can be represented as relative database. The structured learning can also be divided into two kinds, one is dealing with symbols and the other is based on statistics. In the early 1990s, two groups of mathematicians, using the results of mathematics research in the earlier years, developed two kinds of learning theories: Rough Sets theory, dealing with symbolic data, and Support Vector Machine, using statistical learning theory. The research work till now indicates that these two theories are the bases of the structured learning, i.e., they can depict the old algorithms without increasing the computational complexity. These two theories provide criterions to those two kinds of structured learning with solid mathematical foundation, which reduces the uncertainty of the algorithms' design and gives comparisons between algorithms. With the rapid increase of computer network and information, the information and knowledge discovery in mass data, especially in order of "human's understanding of data", needs a useful and fast computing technique. Rough Sets theory, a mathematical theory developed by Polish scientists, is very suitable in the learning of incomplete, imprecision data and uncertain knowledge. It is different from Fuzzy Sets and Statistics, provides a new scientific logic and research method. Moreover, the data reduction technique based on it provides effective means of intelligent information process. We are used to seeing things as several stratums. It is one of the human's characteristics to regard our knowledge systems as different granule stratums. We can now also use "granularity" in the knowledge representation in machines. Based on the "roughness" defined in Rough Sets Theory, We can partition an information system into several granularities, with a complete semi-order lattice. A "granule" is a clump of objects drawn together by indistinguishability, similarity, proximity or functionality. When we change an information system into a "coarser" granularity according to some limitations, we can get a much more concise form of this information system. This dissertation analyses the efficiency of data reduction using Rough Sets Theory, and describes the information granularity formalization method, using algebra, logic and topology. By analyzing the relationships among B-Alebra CG(B), B-logic L(B) and topology (X,f, г ), we give the granularity descriptions of reduction, discretization, etc. We contrast Fuzzy Sets theory with Rough Sets theory, give the difference between the two kinds of information uncertainty. Another field of this dissertation is in the Statistical Learning Theory and Support Vector machine, and their application in the discretization of continuous attributes. |
修改评论