This dissertation deals with the problem of "Data Enrichment", a process of reducing the size of the databases by deleting the redundant data under the constrain of "decision consistence", using Rough Sets method. As a specific kind of "Data Mining", "Data Enrichment" uses Rough Sets as its theoretical background and basic tool and emphasizes "human's understanding of databases". To make Rough Sets a general, efficient method of "Data Mining"; we have investigated the following issues: 1. The parallelization of Rough Sets reduct algorithm. To apply Rough Sets method on very large databases, a "message passing" style parallel Rough Sets reduct algorithm has been designed and implemented on "Dawning I O00A" parallel computer. We have also presented a new Rough Sets reduct algorithm with space complexity of o(n). 2. The discretization methods based on Rough Set reduct. To apply Rough Sets method on continuous data, several discretization methods, including the "from fine to rough", ''from rough to fine" and adaptive method, have been presented under the frame of "discretization based on RS reduct" [18]. And experiments have been done on the UCI machine learning datasets that contain continuous data. 3. The explanation of the result of "Data Enrichment". To make "Data Enrichment" process meaningful, the computing results out of an example of "Promoter Recognition" have been compared with related domain theory. The potential and shortcoming of Rough Sets method have also been investigated.
修改评论