CASIA OpenIR  > 毕业生  > 博士学位论文
基于偏最小二乘的红外光谱降维算法研究
其他题名Research on Dimensionality Reduction Algorithms of Infrared Spectroscopy Based on PLS
唐亮
学位类型工学博士
导师彭思龙
2014-01-11
学位授予单位中国科学院大学
学位授予地点中国科学院自动化研究所
学位专业模式识别与智能系统
关键词偏最小二乘 主成分分析 线性判别分析 半监督降维 Partial Least Squares Principal Component Analysis Linear Discrimdiscriminant Semi-supervised Dimensionality Reduction
摘要红外光谱技术以其对被测样本具有无损、无污染以及检测速度快等优点得到了快速发展,被广泛应用于现代工业生产的离线、在线检测当中。然而,正如数据挖掘所面临的降维问题一样,很多实际获得的光谱数据维数大于样本个数,传统机器学习算法将会遭遇小样本问题。另外在线检测系统的模型是在已有部分样本的情况下建立的,但对于一些特殊应用,比如:白酒的分析来说,从粮食的种植到发酵再到采集,即使采用红外光谱技术,数据获得的周期也比较长,收集到足够建模的样本比较困难。因此如何在极少有标签样本的情况下建立一个预测及泛化能力都很强的模型,是实际生产中需要解决的问题。在此背景下,本文研究了红外光谱机器学习中的相关方法,引入偏最小二乘算法,提出了几个有意义的算法,对数据降维及数据分析具有重要的使用价值和现实意义。 本文的创新成果主要包括: (1)研究线性判别分析和偏最小二乘算法基本原理发现,在两分类问题中类中心固定的情况下增加、减少样本不会影响线性判别分析和偏最小二乘算法的投影方向,而使得两种方法不能一定取得最佳分类面。针对这一问题,提出了线性判别分析和偏最小二乘简单的融合算法和用偏最小二乘算法调整线性判别分析投影方向的算法。两种算法都对原始的分类面做了调整,使得分类结果趋于最大化。实验结果表明,在两种算法的投影方向之间总可以找到一个方向,用该方向上的投影值作为分类器的输入,得到比传统方法更好的结果。 (2)针对样本特征之间的多重相关性问题,提出了基于偏最小二乘回归系数的无监督降维算法。该算法源于主成分分析的基本思想,充分考虑样本某一特征与其它特征之间的关系,利用偏最小二乘算法的回归系数挖掘样本特征之间的这种信息。实验结果表明,在多重相关性大的样本集中,与传统降维方法相比,该算法提高了分类的正确率。而在几乎没有多重相关性的样本集中,该算法结果比传统降维方法的结果略好或几乎相同。 (3)针对工业生产中模型建立之初只有少数精心挑选的有标签样本及大量的无标签样本的情况,提出了一种基于偏最小二乘的半监督降维方法。该算法认为精心挑选的有标签样本可以直接用偏最小二乘算法来处理,并得到一个回归系数。而对于大量无标签样本,我们认为一方面其服从独立同分布的条件,另外一个方面是该类样本能够从一定程度上体现样本的整体分布情况,因此可以将此类样本应用到无监督的偏最小二乘算法中并得到一个回归系数。将两个回归系数进行融合,最后得到半监督的偏最小二乘算法。与无监督或者有监督降维算法相比,该算法不仅能够提高样本的预测精度,而且还能增强模型的泛化能力,同时在有标签样本个数极少或者没有的情况下,也能够获得较好的结果。
其他摘要Infrared spectroscopy is a non-destructive, non-polluting and quick detection method, and it has been widely used in offline/online industrial production.However, many practical spectral data has more features than the number of samples, traditional machine learning algorithms will encounter small sample size problem. Another, the online detection system is based on some of the existing examples, but for some special applications, such as: analysis of liquor is concerned, from food-growing to ferment and then to sample collection, even using infrared spectroscopy, the cycle of data obtained is relatively long, and to collect enough samples also becomes more difficult. Therefore, we need to solve how to establish a model with best prediction and generalization ability in the case of a few samples with labels and other samples without labels. In this context, this paper research on some machine learning algorithms about infrared spectrum. Based on partial least squares algorithm, we proposed several meaningful algorithms which has important value and practical significance for data dimensionality reduction and analysis. There are three main innovations of this paper: Based on the basic principles of linear discriminant analysis and partial least squares algorithm, we found that the projection direction of LDA is indeed the optimal result in the assumption of Gaussian distribution. When the center of each type of samples is determined, and then the rojection direction of LDA is also determined (this can be known from the basic principle of LDA). Currently, if we add some samples which do not affect the class center but affect the shape and distribution of the dataset, the projection direction of LDA is not the optimal. Herein we combine the Partial Least Squares (PLS) method with LDA algorithm, and then propose two improved methods, named LDA-PLS and ex-LDA-PLS, respectively. The LDA-PLS amends the projection direction of LDA by using the information of PLS, while ex-LDA-PLS is an extension of LDA-PLS by combining the result of LDA-PLS and LDA, making the result closer to the optimaldirection by an adjusting parameter. Comparative studies are provided between the proposed methods and other traditional dimension reduction methods such as Principal component analysis (PCA), LDA and PLS-LDA. Experimental results show that the proposed method can achieve better classification performance. For the problem of multiple correlations between features, we applie...
其他标识符201018014628058
语种中文
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/6576
专题毕业生_博士学位论文
推荐引用方式
GB/T 7714
唐亮. 基于偏最小二乘的红外光谱降维算法研究[D]. 中国科学院自动化研究所. 中国科学院大学,2014.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
CASIA_20101801462805(1983KB) 暂不开放CC BY-NC-SA请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[唐亮]的文章
百度学术
百度学术中相似的文章
[唐亮]的文章
必应学术
必应学术中相似的文章
[唐亮]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。