CASIA OpenIR  > 毕业生  > 博士学位论文
机器学习中损失函数的若干问题研究
樊艳波1,2
Subtype工学博士
Thesis Advisor胡包钢 研究员 ; 赫然 研究员 ; Siwei Lyu 副教授
2018-05-24
Degree Grantor中国科学院大学
Place of Conferral北京
Keyword机器学习 聚合损失 度量学习 自步学习 隐含正则子
Abstract
学什么是机器学习中首要的基本研究问题,学什么在计算层面上对应机器学习中损失函数的设计,损失函数得合适与否直接决定了机器学习模型的性能好坏。给定训练样本数据,损失函数的设计通常包含:单个样本损失函数的设计,样本集上聚合损失函数(Aggregate Loss)的设计以及模型正则项的设计。目前关于聚合损失函数的分析和研究比较欠缺,典型的平均损失和最大损失各有优缺点,无法很好地拟合真实数据各种复杂的分布,如类别不平衡分布等。鉴于此,本文提出了平均 Top-K (Average Top-K,ATk)损失作为一种新的聚合损失函数并对其理论性质进行分析和研究,ATk 损失可以更好地拟合数据的不同分布。本文将 ATk 损失应用于度量学习(Metric Learning)中,缓解了度量学习中存在的原空间和变换空间中样本局部结构不一致性和样本难易程度不一致性等问题。另一方面,目前基于自步学习(Self-Paced Learning)的优化策略在非凸优化中得到了越来越多的关注和发展,但其理论分析非常欠缺。本文从隐含正则化的角度出发对自步学习的学习目标以及其对噪音数据和异常样本的鲁棒性原理进行分析和探讨。本文取得的主要研究成果如下
 
1、针对聚合损失函数,本文分析了平均损失、最大损失和第 k 大损失等聚合损失的优缺点,并提出了平均 Top-K 损失函数。ATk 损失函数定义为样本集上前 k 个最大损失的平均值,其包含了平均损失和最大损失,并且是第 k 大损失的凸上界。相比于平均损失和最大损失,ATk 损失能够更好地 拟合不同的数据分布,特别是不平衡数据和多分布数据。ATk 损失是一个非常通用的聚合损失函数,其可以和任何定义在单个样上的损失函数结合起来,并且是单个样本损失的凸函数。 本文分并推导了二分类问题中 ATk 损失的分类校准(classification calibration)性质和 k 值的关联,由此给出了 k 值的一个理论下界。本文将 ATk 损失和 hinge 损失结合起来提出了ATk-SVM 模型,并给出 ATk-SVM 模型的可达误差上界。最后在仿真数据集和真实数据集,以及在分类问题和回归问题中都验证了 ATk 损失的有效性。
 
2、针对度量学习中存在的原空间和变换空间中样本局部结构不一致性和样本难易程度不一致性问题,本文提出了基于平均 Top-K 损失的度量学习模型 ATk-DML。ATk-DML 模型在保持相似样本对之间距离的上界的同时最大化距离最近的前 k 个不相似样本对之间的距离。本文提出了一个高效的算法用以 ATk-DML 模型的求解并在仿真数据集以及真实数据集上都验证了 ATk-DML 模型的正确性和有效性。
 
3、针对自步学习,本文从凸共轭角度出发提出了自步隐含正则子,分析并指出基于自步隐含正则子的自步学习模型 SPL-IR 的优化求解过程对应于一系列隐含的鲁棒原损失函数的最小化,并以此分析了自步学习对噪音数据和异常样本具有一定鲁棒性的原理。此外,本文分析了 SPL-IR 模型和半二次优化(Half-Quadratic Optimization)之间的关联,并提供了一组鲁棒原损失函数诱导的自步隐含正则子。最后,在仿真数据集和真实数据集,以及在矩阵分解和多模态聚类中都验证了 SPL-IR 模型的正确性和有效性。
Other Abstract
"What to learn" is a fundamental research problem in machine learning. It refers to the designing of loss functions in computation level. The properties of loss functions heavily influence model's performance. Usually, given training data, the overall loss contains: individual loss on each sample, aggregate loss on entire training set and regularizer on model parameters. There are only a few choices when we consider the aggregate loss, and typical options like average loss or maximum loss cannot well adapt to different data distributions, such as imbalanced data. In this thesis, we propose the average top-k (ATk) loss as a new aggregate loss and give a learning theory analysis of it. ATk loss can better adapt to different data distributions. We further implement ATk loss into metric learning to alleviate the inconsistence between local structures in the input space and that in the target space, we also take into consideration the different complexities of different samples. On the other hand, self-paced learning (SPL) has been widely used in non-convex optimization, however, its theoretical analysis is limited. In this thesis, we study its learning objectives from implicit regularization perspective and analyze its robustness to noise and outliers accordingly. The main contributions are as follows
 
1. We analyze the properties of average loss, maximum loss and the top-k loss, and propose the average top-k loss as a new aggregate loss. ATk loss is defined as the average of the k largest individual losses over a training dataset, it is a natural generalization of the average loss and the maximum loss, and is a convex upper bound of the top-k loss. Comparing to the average loss and the maximum loss, ATk loss can better adapt to different data distributions, especially for multi-model and imbalanced data. ATk loss a very general aggregate loss and can be combined with any functional individual loss. It remains a convex function over all individual losses. We further study the classification calibration of ATk loss and provide a lower bound on k accordingly. We propose ATk-SVM model that combines the ATk loss and hinge loss, and further study its excess error bound. Finally, we demonstrate the effectiveness of ATk loss for binary classification and regression using both synthetic and real datasets.
 
2. To alleviate the inconsistence between local structures in the input space and that in the target space as well as the inconsistence between the complexities of different samples, we propose ATk-DML model based on the ATk loss. ATk-DML model aims to maximize the average distance between the k nearest dissimilar pairs while maintaining an upper bound of the distance between similar pairs. We provide an efficient algorithm for ATk-DML model and show its correctness and effectiveness on both synthetic and real datasets. 
 
3. We study a group of regularizer (named self-paced implicit regularizer) based on convex conjugacy theory and propose a general self-paced framework SPL-IR accordingly. We demonstrate that the learning procedure of SPL-IR can be considered to sequential optimize a group of latent robust loss functions, thus provide some insights on its robustness to noise and outliers. We further analyze the relation between SPL-IR and half-quadratic optimization and provide a group of self-paced implicit regularizer accordingly. We implement SPL-IR model to matrix factorization and multi-view clustering, and experimental results on both synthetic and real datasets demonstrate its correctness and effectiveness.
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/21025
Collection毕业生_博士学位论文
Affiliation1.中国科学院自动化研究所
2.中国科学院大学
Recommended Citation
GB/T 7714
樊艳波. 机器学习中损失函数的若干问题研究[D]. 北京. 中国科学院大学,2018.
Files in This Item:
File Name/Size DocType Version Access License
机器学习中损失函数的若干问题研究.pdf(4245KB)学位论文 暂不开放CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[樊艳波]'s Articles
Baidu academic
Similar articles in Baidu academic
[樊艳波]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[樊艳波]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.