机器学习中损失函数的若干问题研究

CASIA OpenIR > 毕业生 > 博士学位论文

	机器学习中损失函数的若干问题研究
	樊艳波1,2
	2018-05-24
学位类型	工学博士
中文摘要	学什么是机器学习中首要的基本研究问题，学什么在计算层面上对应机器学习中损失函数的设计，损失函数得合适与否直接决定了机器学习模型的性能好坏。给定训练样本数据，损失函数的设计通常包含：单个样本损失函数的设计，样本集上聚合损失函数（Aggregate Loss）的设计以及模型正则项的设计。目前关于聚合损失函数的分析和研究比较欠缺，典型的平均损失和最大损失各有优缺点，无法很好地拟合真实数据各种复杂的分布，如类别不平衡分布等。鉴于此，本文提出了平均 Top-K （Average Top-K，ATk）损失作为一种新的聚合损失函数并对其理论性质进行分析和研究，ATk 损失可以更好地拟合数据的不同分布。本文将 ATk 损失应用于度量学习（Metric Learning）中，缓解了度量学习中存在的原空间和变换空间中样本局部结构不一致性和样本难易程度不一致性等问题。另一方面，目前基于自步学习（Self-Paced Learning）的优化策略在非凸优化中得到了越来越多的关注和发展，但其理论分析非常欠缺。本文从隐含正则化的角度出发对自步学习的学习目标以及其对噪音数据和异常样本的鲁棒性原理进行分析和探讨。本文取得的主要研究成果如下 1、针对聚合损失函数，本文分析了平均损失、最大损失和第 k 大损失等聚合损失的优缺点，并提出了平均 Top-K 损失函数。ATk 损失函数定义为样本集上前 k 个最大损失的平均值，其包含了平均损失和最大损失，并且是第 k 大损失的凸上界。相比于平均损失和最大损失，ATk 损失能够更好地拟合不同的数据分布，特别是不平衡数据和多分布数据。ATk 损失是一个非常通用的聚合损失函数，其可以和任何定义在单个样上的损失函数结合起来，并且是单个样本损失的凸函数。本文分并推导了二分类问题中 ATk 损失的分类校准（classification calibration）性质和 k 值的关联，由此给出了 k 值的一个理论下界。本文将 ATk 损失和 hinge 损失结合起来提出了ATk-SVM 模型，并给出 ATk-SVM 模型的可达误差上界。最后在仿真数据集和真实数据集，以及在分类问题和回归问题中都验证了 ATk 损失的有效性。 2、针对度量学习中存在的原空间和变换空间中样本局部结构不一致性和样本难易程度不一致性问题，本文提出了基于平均 Top-K 损失的度量学习模型 ATk-DML。ATk-DML 模型在保持相似样本对之间距离的上界的同时最大化距离最近的前 k 个不相似样本对之间的距离。本文提出了一个高效的算法用以 ATk-DML 模型的求解并在仿真数据集以及真实数据集上都验证了 ATk-DML 模型的正确性和有效性。 3、针对自步学习，本文从凸共轭角度出发提出了自步隐含正则子，分析并指出基于自步隐含正则子的自步学习模型 SPL-IR 的优化求解过程对应于一系列隐含的鲁棒原损失函数的最小化，并以此分析了自步学习对噪音数据和异常样本具有一定鲁棒性的原理。此外，本文分析了 SPL-IR 模型和半二次优化（Half-Quadratic Optimization）之间的关联，并提供了一组鲁棒原损失函数诱导的自步隐含正则子。最后，在仿真数据集和真实数据集，以及在矩阵分解和多模态聚类中都验证了 SPL-IR 模型的正确性和有效性。
英文摘要	"What to learn" is a fundamental research problem in machine learning. It refers to the designing of loss functions in computation level. The properties of loss functions heavily influence model's performance. Usually, given training data, the overall loss contains: individual loss on each sample, aggregate loss on entire training set and regularizer on model parameters. There are only a few choices when we consider the aggregate loss, and typical options like average loss or maximum loss cannot well adapt to different data distributions, such as imbalanced data. In this thesis, we propose the average top-k (ATk) loss as a new aggregate loss and give a learning theory analysis of it. ATk loss can better adapt to different data distributions. We further implement ATk loss into metric learning to alleviate the inconsistence between local structures in the input space and that in the target space, we also take into consideration the different complexities of different samples. On the other hand, self-paced learning (SPL) has been widely used in non-convex optimization, however, its theoretical analysis is limited. In this thesis, we study its learning objectives from implicit regularization perspective and analyze its robustness to noise and outliers accordingly. The main contributions are as follows 1. We analyze the properties of average loss, maximum loss and the top-k loss, and propose the average top-k loss as a new aggregate loss. ATk loss is defined as the average of the k largest individual losses over a training dataset, it is a natural generalization of the average loss and the maximum loss, and is a convex upper bound of the top-k loss. Comparing to the average loss and the maximum loss, ATk loss can better adapt to different data distributions, especially for multi-model and imbalanced data. ATk loss a very general aggregate loss and can be combined with any functional individual loss. It remains a convex function over all individual losses. We further study the classification calibration of ATk loss and provide a lower bound on k accordingly. We propose ATk-SVM model that combines the ATk loss and hinge loss, and further study its excess error bound. Finally, we demonstrate the effectiveness of ATk loss for binary classification and regression using both synthetic and real datasets. 2. To alleviate the inconsistence between local structures in the input space and that in the target space as well as the inconsistence between the complexities of different samples, we propose ATk-DML model based on the ATk loss. ATk-DML model aims to maximize the average distance between the k nearest dissimilar pairs while maintaining an upper bound of the distance between similar pairs. We provide an efficient algorithm for ATk-DML model and show its correctness and effectiveness on both synthetic and real datasets. 3. We study a group of regularizer (named self-paced implicit regularizer) based on convex conjugacy theory and propose a general self-paced framework SPL-IR accordingly. We demonstrate that the learning procedure of SPL-IR can be considered to sequential optimize a group of latent robust loss functions, thus provide some insights on its robustness to noise and outliers. We further analyze the relation between SPL-IR and half-quadratic optimization and provide a group of self-paced implicit regularizer accordingly. We implement SPL-IR model to matrix factorization and multi-view clustering, and experimental results on both synthetic and real datasets demonstrate its correctness and effectiveness.
关键词	机器学习聚合损失度量学习自步学习隐含正则子
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/21025
专题	毕业生_博士学位论文
作者单位	1.中国科学院自动化研究所 2.中国科学院大学
第一作者单位	中国科学院自动化研究所
推荐引用方式 GB/T 7714	樊艳波. 机器学习中损失函数的若干问题研究[D]. 北京. 中国科学院大学,2018.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
机器学习中损失函数的若干问题研究.pdf（4245KB）	学位论文		限制开放	CC BY-NC-SA