CASIA OpenIR  > 毕业生  > 硕士学位论文
信息熵在中医数据计算中的应用研究
Alternative TitleStudy on information entropy's application in the calculation of TCM data
郑荣尧
Subtype工学硕士
Thesis Advisor西广成
2010-05-28
Degree Grantor中国科学院研究生院
Place of Conferral中国科学院自动化研究所
Degree Discipline控制理论与控制工程
Keyword信息熵 离散化 贡献度 特征选择 支持向量机 Information Entropy Discretization Contribution Degree Feature Selection Svm
Abstract本研究课题是基于国家重点基础研究发展计划(973计划)资助项目“证候规范及其与疾病、方剂相关的基础研究”。课题研究的主要任务是基于信息熵的复杂系统智能计算方法,根据临床信息分析西医理化指标与中医系统中证候要素之间复杂的相关性,并寻找出诊断证候的相应的症状组合,为证候的现代化和客观诊断奠定基础。针对课题的主要研究任务,本研究课题主要完成以下三个方面的工作,包括: 1. 信息熵在连续属性离散化和相关分析中的应用 基于信息熵的互信息是刻画相关性的一种非线性度量,目前已经大量应用于理论和实践中。但是,在分析理化指标与证候要素相关性时需要假定理化指标是正态分布,在数据量不大的情况下做这样的假定是存在质疑的。为此本文利用基于信息熵和粗糙集的离散化技术在最大程度地保持原理化指标信息表示意义的前提下对其进行离散化,然后计算离散变量之间的相关性,不仅大大减小了相关性的计算量,而且有效避免了正态分布假设产生的误差。 2. 信息熵在中医证候研究中的应用 目前有不少文献在讨论聚类问题,但却几乎没有关于子集大小(包含元素个数)的讨论。为此,本文基于贡献度探讨了一种自适应选择证候元素个数的非监督聚类算法,并把此方法运用到抑郁症、慢性乙型肝炎和慢性肾功能衰竭临床数据中,提取了疾病中的中医证候,此方法为证候的规范化研究在方法学层面提供了一种新思路。 3. 信息熵在特征选择中的应用 中医的临床四诊信息有很多,但是如果将所有的四诊信息都记录并用来辩证不仅采集难度大,而且对辩证也是无益的。因此,依据一个好的特征子集与类别属性高度相关的同时子集内属性之间关联程度要尽量小的原则,本课题基于互信息的特征提取方法选取中医中四诊信息的最优子集,并将提取出来的症状子集用支持向量机分类。最后介绍了一种基于支持向量机的增量学习算法,以应对不断增加的临床数据给分类器训练带来的困难。
Other AbstractMy research project has been supported by National Basic Research Program of China (973), “Basic research on standards of the syndrome defined by Chinese medicine and its correlation with disease and formulas”. The main task is to analyze the complicated correlations between physicochemical parameters and syndromes of TCM, and to discover the relevant symptoms of syndrome by using the entropy based intelligent calculation methods. According to the project requirement, I have finished following works. 1. The application of entropy in discretezation and association analysis Entropy-based mutual information is a nonlinear measure of association, which is extensively applied in theory and practices. But it’s doubted to prescribe the distribution character of variables as normal distribution, particular facing with small sample of data. In order to deal with these problems, we discretize the variables while ensuring the original their information based on rough set and entropy. And then, calculate the correlation of the category variables. In this way, we not only decrease the computational cost, but also avoid the error brought by the assumption of normal distribution. 2. The application of entropy in the syndrome research Many approaches have been introduced to clustering. But there is almost no research on how to determine the cluster size (the number of elements of cluster). In order to deal with these problems, a technique of self-adaptively selecting symptoms’ number of syndromes is proposed, which is based on contribution degree. We apply this method to depression, chronic renal failure (CRF) and chronic hepatitis b data, retrieve syndromes in TCM. The method provides new train of thought for syndrome standardization. 3. The application of entropy in feature selection There are too much four diagnoses available in clinic. It’s difficult to collect all of them, and not beneficial to differentiation of syndromes. In this paper, feature selection based on mutual information is studied to select the most optimal subset of symptoms, and the selected symptoms subset is input to SVM classifier for objectively differentiation of syndromes. An incremental learning algorithm of SVM is introduced for the purpose of resolving difficult of classifier learning brought by increasing clinic data.
shelfnumXWLW1541
Other Identifier200728014628055
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/7511
Collection毕业生_硕士学位论文
Recommended Citation
GB/T 7714
郑荣尧. 信息熵在中医数据计算中的应用研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2010.
Files in This Item:
File Name/Size DocType Version Access License
CASIA_20072801462805(835KB) 暂不开放CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[郑荣尧]'s Articles
Baidu academic
Similar articles in Baidu academic
[郑荣尧]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[郑荣尧]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.