CASIA OpenIR  > 毕业生  > 博士学位论文
基于需求的特征选择
Alternative TitleRequirement-Oriented Feature Selection
梁洪力
Subtype工学博士
Thesis Advisor王珏
2009-03-26
Degree Grantor中国科学院研究生院
Place of Conferral中国科学院自动化研究所
Degree Discipline计算机应用技术
Keyword机器学习 特征选择 Reduct理论 Rashomon 多分类器系统 用户需求 Machine Learning Feature Selection Reduct Theory Rashomon Multiple Classifier System User's Requirement
Abstract特征选择是机器学习领域中一个重要的研究方向。尤其是近年来,随着很多以高维小样本为特征的实际问题的涌现,如:自然语言处理、生物信息、经济与金融、网络与电信和医学等数据分析领域,特征选择问题又一次成为人们关注的焦点。然而,这一次研究特征选择的目的并不是出于节约资源,而是为了提高分类器的分类精度和满足人们日益增长的个性化需求。 Rashomon问题在特征选择领域是一个普遍的现象。对于传统的特征选择目标来说,Rashomon现象可能是一个灾难。但是,对于以需求为导向的特征选择问题来说,Rashomon现象可能正好提供了为不同的需求寻找不同的解的可能性。传统的特征选择算法往往是要寻找使得某个评价指标最优的一个特征子集。由于其特征选择的过程独立于用户的需求,因此,传统的特征选择算法通常不能够满足用户的不同需求。为了满足不同用户的个性化需求,以用户需求为导向的特征选择算法就应运而生了。在这里,用户的需求被描述成用户的一种特征偏好,用一个线性的特征序来表示。因此,如何评价一个特征子集对用户需求的满意程度以及如何求解最能满足用户需求解的问题就成为以用户需求为导向的特征选择算法的核心问题。 近年来,在处理高维问题时,通过进行有效的特征选择来提高分类器的分类精度和模型的稳定性的研究为特征选择的研究注入了新的血液。这类问题的研究往往被形象地称为变量稀疏化问题。对特征选择来说,这类特征选择方法被称为Embedded特征选择算法。 另一类提高高维小样本问题的分类精度和模型稳定性的方法就是设计基于替代训练集的多分类器系统。在多分类器系统中每个替代训练集以及相应的分类器模型都可以看作是对真实问题的一个侧面的描述。通过对问题不同侧面描述的组合,希望获得一个更加稳定的分类器模型。为此,我们设计一个多局部分类器系统试图解决高维小样本问题。其中,每个替代训练集是基于某个reduct的训练子集,不同的替代训练集需要不同的reduct,因此,一个以算法需求为导向的特征选择算法被提出。另外,一个实际的高维小样本问题被介绍用来验证多局部分类器系统的有效性。 本论文的主要成果是: 1.对特征选择算法的发展历史和主要方法进行了较为详细的综述; 2.探讨了特征选择问题中的Rashomon现象; 3.明确提出以用户需求为导向的最优reduct的定义, 并证明基于属性序的最优reduct问题是NP-Hard问题,同时给出了一个领域贪婪的reduct算法; 4.提出了以算法需求为导向的特征选择算法,并在此基础上设计了一个多局部分类器系统。分析了一类典型的高维小样本问题--WSD问题,并用这个实际的例子验证了多局部分类器系统的有效性。
Other AbstractFeature selection is one of the most important research directions in the fields of machine learning. Especially in recent years, along with the appearance of many high dimension / small sample problems, such as, natural language processing, biological information, economic and financial, network and telecom, and medical data analysis, the study of feature selection once again become the focus of attention. Not to save resources, however, the objective of this study is to improve the accuracy of classification or meet the user's increasing requirements. The Rashomon problem in the field of feature selection is a common phenomenon. For the traditional feature selection, Rashomon phenomenon may be a disaster. But, for the requirement-oriented features selection problem, Rashomon phenomenon maybe just give a chance to provide different solutions for different requirements. The traditional feature selection algorithms often seek a feature subset which can optimize some evaluation function. As the feature selection process is independent of user's demands, therefore, the traditional feature selection algorithms always can not meet the different needs of users. In order to meet the personalized needs of different users, we should develop the user-oriented feature selection algorithm. Here, the user's requirement is described as a user preferences for the features with a linear characteristics sequence. Therefore, how to evaluate the satisfaction of the user's requirement and how to find the optimal feature subset which can meet user's requirement will become the core problem of user-oriented feature selection algorithm. Recently, some effective feature selection algorithms for improving its classification accuracy and stability of the model are developed in dealing with high dimensional problems. They have provided some new ideas for the research of feature selection. This kind of problem is often called sparseness problem. Feature selection methods are always called as Embedded feature selection algorithm in this field. Another direction of improving the classification accuracy and stability of the model is to design multi-classifier system based on the substituted train sets. In multi-classifier system, each substituted train set and the corresponding classifier can be regarded as an aspect of the problem. We hope to get a more stable classifier through the descriptions of different aspects of the problem. Therefore, we design a multi-local-classifier syst...
shelfnumXWLW1336
Other Identifier200518014629103
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/6142
Collection毕业生_博士学位论文
Recommended Citation
GB/T 7714
梁洪力. 基于需求的特征选择[D]. 中国科学院自动化研究所. 中国科学院研究生院,2009.
Files in This Item:
File Name/Size DocType Version Access License
CASIA_20051801462910(2048KB) 暂不开放CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[梁洪力]'s Articles
Baidu academic
Similar articles in Baidu academic
[梁洪力]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[梁洪力]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.