物体分类中的特征表达与样本不均衡问题研究

CASIA OpenIR > 毕业生 > 博士学位论文

	物体分类中的特征表达与样本不均衡问题研究
其他题名	Research on Representation and Imbalance Problem for Object Classification
	蔡慧雯
	2014-05-30
学位类型	工学博士
中文摘要	近年来，基于图像的物体分类研究得到了快速发展。尤其随着数据规模、计算资源的急速增长，深度学习技术甚至在大规模物体分类任务上取得了突破性进展。这些大规模物体分类方法一般基于样本充足、分布相对均衡的假设。然而，现实世界中的图像物体分类任务大多数情况下面临的是样本不均衡问题、小样本问题。针对这些方面，基于大规模样本及均衡样本的机器学习方法将不再适用。本论文拟面向样本不均衡问题和小样本问题，在特征表达学习、小类样本重构以及分类器设计方面探索有效的解决方法。论文的主要工作和创新点归纳如下： 1.基于聚类结构局部线性判别分析的特征表达学习一个物体分类问题通常分解为样本收集、特征抽取与分类器构造等步骤。目前的小样本问题解决方案大多集中于样本收集和分类器构造这两个方面，而特征表达一般采用经典方法。特征是分类的核心要素，为了学习有利于小样本问题的特征表达，本文针对物体分类这个特殊领域，提出一种基于聚类结构局部线性判别分析的特征表达方法。该方法在特征表达学习的优化模型中考虑类别样本数目分布，从而帮助小样本类在训练样本不足的情况下提取更具分辨能力的特征。大量的对比实验结果表明，本文提出的方法能够显著改善小样本不均衡情形下的物体分类性能。 2.基于特征子空间的样本重构方法样本重构是解决样本不均衡问题的常用方法。研究表明欠采样技术与过采样技术都能有效提高样本不均衡情形下的分类效果。然而，无论是欠采样还是过采样，都是在某个类别内部随机重复样本或重构小类样本，并没有充分利用先验知识估计小类的可能分布，为小类与大类之间的判别提供依据。本文面向不均衡的多分类问题，从特征迁移的角度出发，提出了基于特征子空间的样本重构方法。首先，基于多视图的思想，将特征空间根据某种策略划分为多个特征子空间；然后，在每个特征子空间中，根据基于分类器的相似度度量，搜索大类别中与小样本类别最为相似的样本集合；最后，根据对应各特征子空间的样本集合，设计算法重构小类样本。这种样本重构方法能够借助不同信息来源之间的判别关系，为小样本类别与大样本类别之间提供足够的区别信息。实验结果表明，相比于多个典型的样本重构方法，本文提出的基于特征子空间的样本重构方法取得了更好更鲁棒的物体分类效果。 3.基于编码空间的分类器设计面向小样本类别的分类器构造方法往往着眼于单个分类器的学习，在多分类问题中，直接采用one-vs-one或one-vs-all的方式分解为二分类问题。本文从多分类问题分解为二分类问题这个特殊环节入手，提出基于编码空间的分类器构造方法。该方法从编码的角度理解多分类问题的分解方式，根据小样本类别在分类器中的表现自动学习新的分类器编码方法，从而帮助小样本类别增加与大样本类别之间的编码距离，减少小样本类别在测试中的误判概率。大量实验结果表明，基于编码空间的分类器设计显著提升了小样本类别的识别性能。
英文摘要	In recent years, research on the image based object classification has made rapid progress. With the sharp increase of the training data and computing resource, methods such as deep learning have made some breakthrough in the large-scale object classification. These methods are usually based on the assumption that the data is sufficient and the distribution between different classes is relatively balanced. However, in most cases, the object classification faces the problem of imbalanced data distribution and small data in one class. Under this circumstance, the machine learning approach based on large-scale data assuming data balance will lose its power. This thesis focuses on the problem of imbalanced data and small data, and explores effective solutions from three perspectives: feature representation learning, sample reconstruction for small data class, and classifier design. The main work and contributions are summarized as follows: 1.Representation learning using cluster-based linear discriminative analysis An objective classification system is usually composed of three steps: training sample collection, feature representation and classifier design. The existing solutions concerning the small data problem focus mostly on training data collection and classifier design, while applying classical feature representation approaches. Feature design is a key factor in classification. In order to learn the appropriate feature representation for the small data problem, this thesis proposes to learn discriminative feature representations which adapt to data distribution. This method applies the cluster-based linear discriminative analysis, and the optimization model takes the sample distribution between classes into account, so that it can learn discriminative features for small data class even though there are only few samples in this class. Extensive experiments demonstrate that, the proposed approach can significantly improve the performance of objectclassification in the case of imbalanced data and small data. 2.Subspace-based sample reconstruction method Sample reconstruction is a common method to solve data imbalance problem. Studies show that both over-sampling and under-sampling can effectively improve the performance of imbalance data classification. However, over-sampling and under-sampling just replicate some samples of the small data class or remove some samples of other classes. These kinds of methods have not made good use of prior knowledge t...
关键词	基于聚类的线性判别分析特征子空间编码空间小样本样本不均衡 Cluster-based Linear Discriminative Analysis Feature Subspace Code Space Small Data Imbalanced Data
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6650
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	蔡慧雯. 物体分类中的特征表达与样本不均衡问题研究[D]. 中国科学院自动化研究所. 中国科学院大学,2014.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20081801462908（3086KB）			暂不开放	CC BY-NC-SA