CASIA OpenIR  > 毕业生  > 博士学位论文
信息论分类学习的若干问题研究
Alternative TitleStudy on Some Issues of Information-Theoretic Classification Learning
刘灿涛
Subtype工学博士
Thesis Advisor胡包钢
2010-06-03
Degree Grantor中国科学院研究生院
Place of Conferral中国科学院自动化研究所
Degree Discipline计算机应用技术
Keyword信息论学习 分类性能 评价准则 方向互信息 Renyi熵 特征选择 生态数据 透明度 Information-theoretic Learning Classifier Evaluating Criterion Direction Mutual Information Feature Selection Renyi’s Entropy Ecological Data Transparency
Abstract信息论学习是近年来兴起的机器学习领域的一个分支,它以直接从数据中获得的熵和散度作为描述去替代传统统计学的方差和协方差,可以被使用在监督和非监督机器学习中。在信息论学习的基本框架中最重要的两部分是用信息准则评估学习算法和根据评估的结果提出新的学习算法。 本文重点研究了信息论学习中的分类学习,包括三个方面的问题,一是如何利用信息论准则评价分类模型,二是由互信息这个评价准则提出了基于Renyi熵互信息的特征选择算法。三是在特征选择算法的基础上,本文进一步利用信息论分类学习的方法研究如何增加生态数据的透明度。具体说来如下: ①基于信息论的分类模型评价准则  本文研究了二值分类中经验互信息相对于其自由参数的凸性,并分析了其最优解与目标类别和预测类别独立条件的关系。在此基础上,本文提出了方向互信息的概念,在理论上分析了在其它自由参数固定的情况下,方向互信息与分类准确率成单调递增的关系,从而避免了经验互信息在不同分类模型对应相同值的可能。  本文在利用方向互信息对二值分类模型的基础上,对带有拒识判别的二值分类模型也做了初步探索,本文所提方法的优点是在拒识判别和减少误差做了一个更好的平衡。  针对于多类问题,本文按照不同类别间的关系(而非该类别与其它所有类别)将其分解成一个个二值分类问题,将方向互信息拓展到对多类问题的分类模型评价。 ②基于Renyi熵互信息的特征选择 针对于大规模数据集的数据挖掘时,速度是瓶颈的问题,本文提出了基于Renyi熵互信息的特征选择方法。在已有Renyi熵估计的基础上,本文结合大规模数据集数据量大的特点和概率论中的大数定律,对Renyi熵进行了一个近似估计,将其计算复杂度从O(N 2 ③信息论分类学习在增加生态数据透明度中的应用)降低到O(N)。结合最小冗余度最大相关度的特征选择算法,本文利用所提出的Renyi熵估计方法对互信息进行估计,从而降低了特征选择算法的计算复杂度。实验结果表明本文所提出的基于Renyi熵互信息的特征选择算法在分类准确率类似的情况下,计算速度有了大大的提高。 本文以森林覆盖类型数据集为例,研究了如何利用信息论分类学习增加生态数据的透明度。首先计算了该数据集每一维属性所包含的信息量,即其熵值的大小;其次分析了每一维属性和类别的互信息,研究它们与类别的相关程度;接着研究属性之间的互信息,揭示了属性之间的关系和冗余程度;最后根据本文所提出的特征选择方法,对各个属性相对于分类而言的重要性程度进行排序。
Other AbstractInformation-theoretic learning (ITL) is a branch of machine learning arising in recent years. Entropy and divergence, obtained from the data directly, are proposed as ITL criterions instead of the conventional variance and covariance, and can be applied supervised and unsupervised learning. Evaluating the learning algorithm with the IT criterion and proposing the new algorithm according to the evaluating result are the two most important woks in ITL framework. Our work in this thesis focuses three issues of Information-Theoretic Classifica- tion learning. First we study how to evaluate the classifier with IT criterion. Second we propose a new feature selection (FS) algorithm via mutual information based on Renyi’s entropy. Last we study the application of ITL to increase the transparency of ecological data according to the proposed FS algorithm. The detail is as follows. ①Evaluating criterion of the classifier based on ITL We prove mutual information (MI) is a convex function relative to its free parameters in the binary classifier, and analyse the relation between its optimal solution and the independent condition of the target and predicting class label. According to these results, we propose the concept of the direction MI, and give the theoretic proof that it monotonically increases with accuracy fixed another free parameter. Direction MI avoids the things that the different classifiers have the same MI value. Then, we explore the evaluation method of the binary classifier with the reject option with direction MI, which is a better trade-off between the performances and reject than the existing MI method. We divide the multi-class classification task into multiple binary classification problems in the light of the different class instead of one class and other classes so that direction MI expands to the evaluation of multi-class classifier. ②Feature Selection via mutual information based on Renyi’s entropy For the computing speed is the bottleneck in the data mining of the large scale data sets, we propose feature selection (FS) algorithm via mutual information based on Renyi’s entropy. We propose the approximate estimating of Renyi’s entropy with the law of large numbers and the characteristic of the large scale data sets, which decreases the computational complexity from O(N2 ③Application of ITL to increase the transparency of ecological data ) to O(N). Based on min-redundancy-max-dependency, we estimate MI with proposed the approximate es...
shelfnumXWLW1519
Other Identifier200718014629090
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/6292
Collection毕业生_博士学位论文
Recommended Citation
GB/T 7714
刘灿涛. 信息论分类学习的若干问题研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2010.
Files in This Item:
File Name/Size DocType Version Access License
CASIA_20071801462909(2794KB) 暂不开放CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[刘灿涛]'s Articles
Baidu academic
Similar articles in Baidu academic
[刘灿涛]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[刘灿涛]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.