CASIA OpenIR  > 毕业生  > 硕士学位论文
基于Rough Sets的"数据浓缩"研究
姚杰
Subtype工学硕士
Thesis Advisor王珏
1999-06-01
Degree Grantor中国科学院自动化研究所
Place of Conferral中国科学院自动化研究所
Degree Discipline模式识别与智能系统
Keyword数据浓缩 Rough Sets(Rs) 知识发现 约简 差别矩阵 并行计算 离散化 领域解释 Data Enrichment Rough Sets (Rs) Knowledge Discovery Reduct Discernibility Matrix Parallel Computing Discretization Domain
Abstract本文的研究是围绕着使用Rough Set(RS)方法的“数据浓缩”展开的。 “数据浓缩”是通过划数据库中冗余数据的“蒸发”,使数据库变为相对自 然简洁的表示,以便用户更容易地理解数据库内容的过程。“数据浓缩”本 质上是“数据挖掘”过程的一部分,只是它的研究方法局限于RS方法,而 且更加强调“人对数据库的理解”。我们的研究目的是提高RS方法的计算 性能和应用范围,使RS方法成为一种高效通用“数据挖掘”算法。我们的 研究了包括内容: 为了处理超大规模的数据库,我们研究了RS约简算法的并行化。针对分 布式主存的并行体系结构,我们设计了“消息传递”的并行RS约简算法并 在曙光1000A并行计算机上实现了并行程序。我们还改进了传统的RS约简 算法。改进的算法突破了存储空间限制,具有更大的并行可扩展性。 为了处理连续型数据,我们研究了基于RS约简的连续属性离散化。针对 “数据浓缩”问题,我们采用Son H.Nguyen提出的基于RS约简的离散化 框架[18],设计了“由细到粗”, “由粗到细”,动态的连续属性离散化的 算法。使RS成为处理离散数据和连续数据的统一的方法。 为了强调“数据浓缩”的“人对数据的理解”目的,我们对分子生物学 领域的“数据浓缩”结果做出了领域解释。通过分析RS约简算法的应用, 我们更加深刻地理解了算法的本质,也对算法的进一步改进提出了方向。
Other AbstractThis dissertation deals with the problem of "Data Enrichment", a process of reducing the size of the databases by deleting the redundant data under the constrain of "decision consistence", using Rough Sets method. As a specific kind of "Data Mining", "Data Enrichment" uses Rough Sets as its theoretical background and basic tool and emphasizes "human's understanding of databases". To make Rough Sets a general, efficient method of "Data Mining"; we have investigated the following issues: 1. The parallelization of Rough Sets reduct algorithm. To apply Rough Sets method on very large databases, a "message passing" style parallel Rough Sets reduct algorithm has been designed and implemented on "Dawning I O00A" parallel computer. We have also presented a new Rough Sets reduct algorithm with space complexity of o(n). 2. The discretization methods based on Rough Set reduct. To apply Rough Sets method on continuous data, several discretization methods, including the "from fine to rough", ''from rough to fine" and adaptive method, have been presented under the frame of "discretization based on RS reduct" [18]. And experiments have been done on the UCI machine learning datasets that contain continuous data. 3. The explanation of the result of "Data Enrichment". To make "Data Enrichment" process meaningful, the computing results out of an example of "Promoter Recognition" have been compared with related domain theory. The potential and shortcoming of Rough Sets method have also been investigated.
shelfnumXWLW521
Other Identifier521
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/7269
Collection毕业生_硕士学位论文
Recommended Citation
GB/T 7714
姚杰. 基于Rough Sets的"数据浓缩"研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,1999.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[姚杰]'s Articles
Baidu academic
Similar articles in Baidu academic
[姚杰]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[姚杰]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.