基于Rough Sets的"数据浓缩"研究

CASIA OpenIR > 毕业生 > 硕士学位论文

	基于Rough Sets的"数据浓缩"研究
	姚杰
	1999-06-01
学位类型	工学硕士
中文摘要	本文的研究是围绕着使用Rough Set(RS)方法的“数据浓缩”展开的。 “数据浓缩”是通过划数据库中冗余数据的“蒸发”，使数据库变为相对自然简洁的表示，以便用户更容易地理解数据库内容的过程。“数据浓缩”本质上是“数据挖掘”过程的一部分，只是它的研究方法局限于RS方法，而且更加强调“人对数据库的理解”。我们的研究目的是提高RS方法的计算性能和应用范围，使RS方法成为一种高效通用“数据挖掘”算法。我们的研究了包括内容：为了处理超大规模的数据库，我们研究了RS约简算法的并行化。针对分布式主存的并行体系结构，我们设计了“消息传递”的并行RS约简算法并在曙光1000A并行计算机上实现了并行程序。我们还改进了传统的RS约简算法。改进的算法突破了存储空间限制，具有更大的并行可扩展性。为了处理连续型数据，我们研究了基于RS约简的连续属性离散化。针对 “数据浓缩”问题，我们采用Son H.Nguyen提出的基于RS约简的离散化框架[18]，设计了“由细到粗”， “由粗到细”，动态的连续属性离散化的算法。使RS成为处理离散数据和连续数据的统一的方法。为了强调“数据浓缩”的“人对数据的理解”目的，我们对分子生物学领域的“数据浓缩”结果做出了领域解释。通过分析RS约简算法的应用，我们更加深刻地理解了算法的本质，也对算法的进一步改进提出了方向。
英文摘要	This dissertation deals with the problem of "Data Enrichment", a process of reducing the size of the databases by deleting the redundant data under the constrain of "decision consistence", using Rough Sets method. As a specific kind of "Data Mining", "Data Enrichment" uses Rough Sets as its theoretical background and basic tool and emphasizes "human's understanding of databases". To make Rough Sets a general, efficient method of "Data Mining"; we have investigated the following issues: 1. The parallelization of Rough Sets reduct algorithm. To apply Rough Sets method on very large databases, a "message passing" style parallel Rough Sets reduct algorithm has been designed and implemented on "Dawning I O00A" parallel computer. We have also presented a new Rough Sets reduct algorithm with space complexity of o(n). 2. The discretization methods based on Rough Set reduct. To apply Rough Sets method on continuous data, several discretization methods, including the "from fine to rough", ''from rough to fine" and adaptive method, have been presented under the frame of "discretization based on RS reduct" [18]. And experiments have been done on the UCI machine learning datasets that contain continuous data. 3. The explanation of the result of "Data Enrichment". To make "Data Enrichment" process meaningful, the computing results out of an example of "Promoter Recognition" have been compared with related domain theory. The potential and shortcoming of Rough Sets method have also been investigated.
关键词	数据浓缩 Rough Sets(Rs) 知识发现约简差别矩阵并行计算离散化领域解释 Data Enrichment Rough Sets (Rs) Knowledge Discovery Reduct Discernibility Matrix Parallel Computing Discretization Domain
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/7269
专题	毕业生_硕士学位论文
推荐引用方式 GB/T 7714	姚杰. 基于Rough Sets的"数据浓缩"研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,1999.