Data description is a sort of special task of data mining:for given user's requirements,transforming the information system(data set)defined on symbol domain into human-readable texts with different concise degree,and at the same time, analyzing exceptions produced in the procedure of transforming.This task conforms to the principle of"rule-plus-exception"in cognitive psychology.It has three points: (1)finding solutions according to user's requirements,(2)obtaining texts with different concise degree,and(3)analyzing exceptions.We employ the reduct theory in rough set theory as the tool to.formalize the problems of data description,and design the corresponding algorithms. It is inadequate for using the notions of "positive region" and "boundary region" to represent rules and exceptions directly.We modify the two notions into"cognitive positive region" and " cognitive boundary region" separately so as to depict the rule-Plus-exception model accurately.Since positive。region serves as the basis of reduct theory and is unique for given information system,while cognitive positive region does not satisfy the condition of uniqueness,we redefine and prove all notions and properties originally defined based on positive region.Usually,users hope to obtain concise description with respect to the given requirement,so we define the text granule by the notion of"reduct"based on cognitive positive region,as the concise description of data set. Traditionally,researches on rough set theory do not pay much attention to the structure of boundary region.However,"exception"relates closely to boundary region.Hence,we investigate the structure and properties of boundary.region in order to gain insights into the structure of exception space and ground exception analysis. For the sake of identifying exceptions effectively, we design a special discernibility matrix to analyze the structure of boundary region and the process of producing exceptions,and present the approach of identifying exceptions based on"core". "Core"and"reduct"are two basic notions in reduct theory.Core has an important property:if a core attribute is removed from the given information system, then the boundary region of the information system will change.This property is the basis of computing exceptions.In addition,there is a critical relation between reduct and core:if a new information system is constructed based on a reduct of given information system,then attributes in the new information system all belong to its core.Which implies,if a reduct is computed from the given information system,then removing attributes in the reduct step by step can produce texts with different concise degree and corresponding exceptions.The precondition for employing the above approach to describe large.scale information systems is to find fast algorithms for comput
修改评论