迁移学习中弱标注信息的挖掘与利用

CASIA OpenIR > 毕业生 > 硕士学位论文

	迁移学习中弱标注信息的挖掘与利用
	黄文振
	2017-05
学位类型	工程硕士
英文摘要	机器学习作为人工智能最重要的分支之一，已经得到深入的研究，并在众多领域中得到广泛的应用，但大多数的机器学习方法都基于一个重要假设：训练任务和测试任务来自相同域，即两个任务有相同的特征空间、标签空间并服从相同的分布。该假设并不总是成立的，迁移学习即面向解决该类问题，其中的重要手段包括消除任务间的差异以及利用任务间的关联等。迁移学习自提出至今已有了长足的发展，但仍有许多问题有待探索，如：现有的工作主要集中在解决特征空间及其分布间的差异，少有工作考虑标签空间的差异；而现有的基于边缘分布对齐的方法尽管取得了不错的效果，但通过实现有监督的对齐，其性能可进一步提高，此外目前相关的理论研究也相对匮乏。本文将通过“挖掘和利用弱标注信息”的方法来解决上述两个问题： 1. 视觉美感质量评估旨在对图片进行美感方面的质量评估。常见的评估任务分两种：一种为美感分类任务，对图片进行二值分类评估，分为高质量图片和低质量图片；另一种为美感回归任务，对图片使用精确的数值评估。后者价值更大，但对应的训练样本也更难获得，所以，我们使用弱标注信息（图片的分类标签）来提高回归模型的预测精度。基于“美感一致性”假设，我们构造了一种特殊的正则项将美感分类和回归模型联系在一起。使用CUHKPQ图像美感分类数据库作为源域，AVA数据库作为目标域，进行图像美感回归任务的训练，实验结果表明本文的方法相较于传统的方法可以有效地提高预测精度。 2. 无监督领域自适应旨在使用有标签的源域样本来辅助一个相关但有差异的目标域任务进行训练，其中目标域的样本集没有标签。已有工作通常考虑对齐源域和目标域的边缘分布，但这样并不能有效地消除同类样本分布间的差异，故我们考虑对齐源域和目标域的条件分布，即对齐两个域中同类样本的分布。通常对齐条件分布需要两个域的类别标签，而针对无监督域适应问题，只有源域标注这种不完整的信息。我们要从这样的“弱标注信息”中挖掘出目标域样本的“伪标签” ，进而估计和对齐条件分布。在几个领域自适应常用数据库上进行迁移训练，实验结果证实我们的方法优于以往的域适应方法。此外，我们还对该模型进行了详尽的理论分析，来解释它优于其他模型的原因。 ; Machine learning methods have already achieved significant success in many areas.However, most of traditional machine learning methods assume that training and test data in the same feature spaces and the same label spaces and have the same distributions of features and labels. When this assumption is not hold, the performances of them would decline obviously. Transfer learning methods can solve this problem by eliminating the differences between tasks and utilizing the relationship among tasks. Although transfer learning methods have obtained great development, there are still many issues to be explored, such as: the existing researches mainly focus on the difference between the feature spaces or between the marginal probability distributions, while few works involve the differences of label spaces; and the methods to eliminate the difference between the marginal probability distributions almost ignore the relationship between features and labels. We explore these two issues by ’mining and utilizing of weak label information’: 1. Visual aesthetic quality assessment is to assess the quality of a given image in the aesthetic perspective. There are two fundamental types of the assessment tasks. One type is assessing the quality of an image with a label -1 or +1 to indicate whether it is of low or high quality, and the other one is rating the image with a precise score. The other one is of more practical value, but the training samples of it are more difficult to obtain. Therefore, we utilize the weak label information, the class labels of training samples, to help improve the performance of the regression model. Based on a simple assumptionthat the label sets of two tasks evaluate the aesthetic quality according to the similar criterion, we construct a special regularization term to link the two models together. The effectiveness of our method is proven by experimental results that the error is reduced obviously with the help of source domain. 2. Unsupervised domain adaptation aims to transfer the model trained on the source domain to the target domain, samples of which are unlabeled. Different from most of existing methods which adjust the features space to match the marginal distributions of two domains, we proposes a category-wise adaptation methods to minimize the distancebetween the conditional distributions, the distributions of each category. Generally speaking, matching the conditional distributions need the labels of two domains. Therefore we mine the pseudo labels of the target samples from the weak label information, the sourcelabels, and then using the pseudo labels to estimate and match the conditional distributions. In addition, we give a theoretical analysis to explain why the model can obtain a better performance. The experimental results show that our method outperforms thecurrent state-of-the-art on standard domain adaptation datasets.
关键词	迁移学习无监督领域自适应深度学习美感质量评估
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/14827
专题	毕业生_硕士学位论文
作者单位	中国科学院自动化研究所
第一作者单位	中国科学院自动化研究所
推荐引用方式 GB/T 7714	黄文振. 迁移学习中弱标注信息的挖掘与利用[D]. 北京. 中国科学院研究生院,2017.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
main.pdf（6331KB）	学位论文		限制开放	CC BY-NC-SA