信息熵在中医数据计算中的应用研究

CASIA OpenIR > 毕业生 > 硕士学位论文

	信息熵在中医数据计算中的应用研究
其他题名	Study on information entropy's application in the calculation of TCM data
	郑荣尧
	2010-05-28
学位类型	工学硕士
中文摘要	本研究课题是基于国家重点基础研究发展计划(973计划)资助项目“证候规范及其与疾病、方剂相关的基础研究”。课题研究的主要任务是基于信息熵的复杂系统智能计算方法，根据临床信息分析西医理化指标与中医系统中证候要素之间复杂的相关性，并寻找出诊断证候的相应的症状组合，为证候的现代化和客观诊断奠定基础。针对课题的主要研究任务，本研究课题主要完成以下三个方面的工作，包括： 1. 信息熵在连续属性离散化和相关分析中的应用基于信息熵的互信息是刻画相关性的一种非线性度量，目前已经大量应用于理论和实践中。但是，在分析理化指标与证候要素相关性时需要假定理化指标是正态分布，在数据量不大的情况下做这样的假定是存在质疑的。为此本文利用基于信息熵和粗糙集的离散化技术在最大程度地保持原理化指标信息表示意义的前提下对其进行离散化，然后计算离散变量之间的相关性，不仅大大减小了相关性的计算量，而且有效避免了正态分布假设产生的误差。 2. 信息熵在中医证候研究中的应用目前有不少文献在讨论聚类问题，但却几乎没有关于子集大小(包含元素个数)的讨论。为此，本文基于贡献度探讨了一种自适应选择证候元素个数的非监督聚类算法，并把此方法运用到抑郁症、慢性乙型肝炎和慢性肾功能衰竭临床数据中，提取了疾病中的中医证候，此方法为证候的规范化研究在方法学层面提供了一种新思路。 3. 信息熵在特征选择中的应用中医的临床四诊信息有很多，但是如果将所有的四诊信息都记录并用来辩证不仅采集难度大，而且对辩证也是无益的。因此，依据一个好的特征子集与类别属性高度相关的同时子集内属性之间关联程度要尽量小的原则，本课题基于互信息的特征提取方法选取中医中四诊信息的最优子集，并将提取出来的症状子集用支持向量机分类。最后介绍了一种基于支持向量机的增量学习算法，以应对不断增加的临床数据给分类器训练带来的困难。
英文摘要	My research project has been supported by National Basic Research Program of China (973), “Basic research on standards of the syndrome defined by Chinese medicine and its correlation with disease and formulas”. The main task is to analyze the complicated correlations between physicochemical parameters and syndromes of TCM, and to discover the relevant symptoms of syndrome by using the entropy based intelligent calculation methods. According to the project requirement, I have finished following works. 1. The application of entropy in discretezation and association analysis Entropy-based mutual information is a nonlinear measure of association, which is extensively applied in theory and practices. But it’s doubted to prescribe the distribution character of variables as normal distribution, particular facing with small sample of data. In order to deal with these problems, we discretize the variables while ensuring the original their information based on rough set and entropy. And then, calculate the correlation of the category variables. In this way, we not only decrease the computational cost, but also avoid the error brought by the assumption of normal distribution. 2. The application of entropy in the syndrome research Many approaches have been introduced to clustering. But there is almost no research on how to determine the cluster size (the number of elements of cluster). In order to deal with these problems, a technique of self-adaptively selecting symptoms’ number of syndromes is proposed, which is based on contribution degree. We apply this method to depression, chronic renal failure (CRF) and chronic hepatitis b data, retrieve syndromes in TCM. The method provides new train of thought for syndrome standardization. 3. The application of entropy in feature selection There are too much four diagnoses available in clinic. It’s difficult to collect all of them, and not beneficial to differentiation of syndromes. In this paper, feature selection based on mutual information is studied to select the most optimal subset of symptoms, and the selected symptoms subset is input to SVM classifier for objectively differentiation of syndromes. An incremental learning algorithm of SVM is introduced for the purpose of resolving difficult of classifier learning brought by increasing clinic data.
关键词	信息熵离散化贡献度特征选择支持向量机 Information Entropy Discretization Contribution Degree Feature Selection Svm
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/7511
专题	毕业生_硕士学位论文
推荐引用方式 GB/T 7714	郑荣尧. 信息熵在中医数据计算中的应用研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2010.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20072801462805（835KB）			限制开放	CC BY-NC-SA