基于主曲线的无监督排序学习及其在综合评价中的应用

CASIA OpenIR > 毕业生 > 博士学位论文

	基于主曲线的无监督排序学习及其在综合评价中的应用
其他题名	Unsupervised Ranking Based on Principal Curves and Its Application to Comprehensive Evaluation of Multi-Attribute Objects
	李纯果
	2014-11-27
学位类型	工学博士
中文摘要	综合评价问题是全社会高度关注和普遍重视的一个重要课题，也是最有争议的一个课题。从机器学习的角度而言，综合评价问题是一类无监督排序问题，因为排序对象的真实排序很难获得。而无监督排序学习模型面临两个主要挑战：（1）在没有真实排序结果的情况下，如何保证排序模型的合理性？（2）如何选择排序模型的非线性复杂度？在“数据”与“知识”共同驱动模型的框架下，本文从排序对象在多个评价指标上的观测数据出发，结合排序问题的先验知识，系统讨论了无监督排序学习的评价问题，以及领域知识在排序模型中的嵌入问题。论文的主要贡献总结如下： 1. 本文首先明确了在排序问题中关于排序规则的五条重要元准则：尺度不变性和平移不变性，严格单调性，线性和非线性的包容性，光滑性，参数规模的明确性。这五条重要的元准则的出发点是使排序规则尽可能地保持多指标数据空间中的观测数据的潜在的真实排序，保证排序结果的合理性。五条元准则也可以作为排序模型的一个高层次的评价准则，用来评价比较不同排序模型的排序结果，为无监督排序学习的评价开辟了一个新思路。 2. 本文提出了一类可以满足所有五条元准则的排序模型— 排序主曲线。排序主曲线以严格单调的空间曲线拟合排序对象观测数据的骨架结构作为“排序轴”，是第一主成分分析在原数据空间的非线性推广。排序主曲线模型把非线性领域知识，结构性嵌入排序模型中，对模型参数增加约束使其满足排序的五条元准则，并根据模型参数的规模来选择排序模型的非线性复杂度。本文还提出了一种以三次B′ezier曲线参数化的排序主曲线模型。对三次B′ezier曲线的控制点进行约束，就可以使其满足严格单调性元准则，而三次B′ezier曲线自动满足其它四条排序元准则，且控制点的个数是对曲线的非线性复杂度的约束。该排序主曲线模型在理论上可以证明其存在性，模型的学习算法收敛性也得到了理论上的证明。本文提出的排序模型对排序对象的评价分值是个连续量的输出结果，比直接输出排序结果能包含更多的排序分布信息，而且模型的几何意义明确，可以同时包容线性与非线性关系，复杂度由曲线的控制点确定，自然跨越了模型规模的选择问题。 3. 对于评价指标的重要性排序，本文借助评价指标与排序分值之间的严格单调关系，提出了两阶段的评价指标重要性排序。阶段I采用评价指标间的斯皮尔曼等级相关系数（SRCC）从观测属性中选取与评价相关的评价指标，并在选取的评价指标的观测数据上学习一个排序模型，阶段II采用扩展傅里叶振幅敏感性分析（EFAST）计算每个评价指标对排序模型的主效应，来衡量评价指标对排序的重要程度。本文把主要的研究成果应用在实际问题中对国家、学术期刊、大学的综合评价上，可以得到与用户认知比较一致的研究对象排序结果和评价指标排序结果，说明了本文提出的排序框架具有一定的实际应用价值。
英文摘要	Comprehensive Evaluation is a very important research issue in practice. However, it endures a lot of controversy since there exist no ground truths. From the viewpoint of machine learning, comprehensive evaluation is an unsupervised ranking problem. Unsupervised ranking faces two critical challenges. One is the evaluation of ranking models, independent of ranking labels. The other is how to determine the nonlinear complexity of ranking models. With the framework of “Data and Knowledge Driven” models, this thesis systematically discusses the evaluation of unsupervised ranking from multiattribute numerical observations of ranking candidates. The main contributions of this thesis are given as following: 1. Five essential ranking meta-rules, namely, Scale and Translation Invariance, Strict Monotonicity, Compatibility of Linearity and Nonlinearity, Smoothness, Explicitness of Parameter Size, are drawn formally from ranking domain knowledge. With all five essential meta-rules, a ranking rule can produce a ranking list as closely as possible to the latent ground-truth labels. All five meta-rules can also be a high level evaluation for ranking models and provide a comparable evaluation method, which is independent of ranking labels, for different unsupervised ranking models. 2. A ranking principal curve (RPC) model, which follows all the five essential ranking meta-rules, is presented for unsupervised ranking from multiattribute objects. RPC is the data skeleton passing through the middle of the data cloud and approximated by a strictly monotone curve. RPC provides a nonlinear “ranking coordinate”, instead of a linear ranking coordinate by the first principal component analysis. For application, RPC is parameterized by a cubic B′ezier curve with control points restricted in a hypercube. The RPC existence and the convergency of RPC learning algorithm are proved theoretically. With the presented RPC models, continuous scores are output for ranking candidates and provide more information than scores themselves. Moreover, RPC models have the capacity of linearity and nonlinearity which are determined by model parameters. The nonlinear complexity of RPC models are also determined by parameters. 3. For the importance ranking of attributes, a two-phase attribute selection algorithm is proposed based on the knowledge of strict monotonicity between attributes and grading scores for objects. Phase I removes those irrelevant attributes for ranking based on Spea...
关键词	无监督排序主曲线领域知识元准则综合评价 Unsupervised Ranking Principal Curves Prior Information Meta- Rules Comprehensive Evaluation
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6656
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	李纯果. 基于主曲线的无监督排序学习及其在综合评价中的应用[D]. 中国科学院自动化研究所. 中国科学院大学,2014.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20111801462804（5249KB）			暂不开放	CC BY-NC-SA