基于深度学习的大空间变换下细粒度图像分类算法研究

CASIA OpenIR > 毕业生 > 硕士学位论文

	基于深度学习的大空间变换下细粒度图像分类算法研究
	王军鹏
	2020-05-27
页数	95
学位类型	硕士
中文摘要	随着深度学习技术的不断发展，图像分类算法已经在多个数据集上超过了人眼的分别能力。结合实际场景的需求，细粒度图像分类任务也受到了越来越多的关注。细粒度图像分类任务是指对同一父类别下的不同子类别进行分类。细粒度图像分类可以辅助各个特定专业领域的研究。细粒度图像分类主要有两个难点：一是同一类别下样本的外观是相似的，该类别下图像的差异主要由样本的空间变换产生；二是各类别间的差异一般仅体现在样本的部件上。两难点使普通的图像分类算法不适用于此任务。本文针对细粒度图像分类任务的两个难点，结合多种卷积神经网络进行了算法设计。本文的主要工作和贡献如下：探究了空间变换对分类效果的影响为探究空间变换对神经网络性能的影响，在MNIST手写数字数据集上设计了空间扰动下的分类实验。定量地对原始图片添加不同幅度的仿射变换扰动，并观察神经网络的性能变化。结果表明：1. 神经网络对样本小范围的基本空间变换具有鲁棒性；2. 卷积神经网络抵抗空间变换的效果优于全连接网络。3. 多种基本变换组合形成的复杂变换将使卷积网络性能急剧降低。解决了细粒度样本类内空间变换大的问题针对细粒度样本类内空间变换大的缺点，我们引入了空间变换网络（spatial transformer networks，STN）。针对STN应用在细粒度数据集上的边缘损失问题，我们提出了重叠区域（intersection area, IA）损失，并从理论上论证了其可行性。我们在多个数据集上进行了分类实验，得到了最优的IA损失系数，并验证了模型的性能优势。设计了两种可以解决细粒度分类两个难点的分类算法针对细粒度分类任务的两个难点，引入了适用于细粒度分类任务的双线性网络（Bilinear CNN，BCNN）。双线性以外积的计算方式捕获了不同通道之间的相关性，产生了具有判别性的特征。将双线性网络与改进的STN结合，提出了一种新的网络结构ST-BCNN。为了捕获各个通道间的非线性关系，将多项式核函数引入ST-BCNN，得到了Kernel ST-BCNN。我们在三个数据集上进行了实验。通过实验得到了多项式核的最优次数，并验证了模型相对其他经典方法的优势。
英文摘要	With the development of Deep learning methods, the performance of image classification algorithms has outperformed the ability of human eyes. Combined with the needs of actual scenarios, fine-grained image classification task has also gotten more attention. Fine-grained image classification means classifying images from the same class into different subclass. Fine-grained image classification can benefit studies in each specific field. There are two main difficulties in fine-grained image classification. Firstly, the appearance of samples under the same category is similar, thus the differences of images are mainly caused by spatial transformation. Secondly, differences between classes lie in components of samples. These difficulties make common classification algorithms unsuitable for this task. Based on two difficulties in fine-grained image classification, this thesis designs novel algorithms based on several Convolutional Neural Networks (CNN). The main work and contributions of our paper are summarized as follows: 1. Exploring the effect of spatial transformation on the performance of classification To explore the effect of spatial transformation on the performance of classification, we design classification experiments with spatial disturbances on MNIST dataset. We quantitatively add different amplitude affine disturbances to original image and observe the performance of neural networks. Results show: 1. Neural networks are robust to small disturbance in samples; 2. Performance of CNN is better than fully-connected neural networks in resisting spatial disturbances. 3. Complex transformation formed by combination of multiple basic transformations will impair the performance of CNN rapidly. 2. Solving the problem of large intraclass spatial variants of fine-grained samples Based on large intraclass spatial variants of fine-grained samples, we introduce Spatial Transformation Networks (STN). To solve the boundary loss of STN on fine-grained image classification dataset, we design intersection area (IA) loss and prove its feasibility in theory. We conduct experiments on several datasets. Finally, we get the optimal value of IA parameter and verify advantages of our method. 3. Designing two classification algorithms to solve two difficulties of fine-grained classification task Based on two difficulties of fine-grained classification task, we introduce Bilinear CNN (BCNN) which is suitable for fine-grained classification task. BCNN captures the relationship between different channels with outer product and produces discriminative features. We combine STN and BCNN to get a novel CNN structure: ST-BCNN. To capture non-linear relationship between channels, we introduce polynomial kernel algorithm in ST-BCNN to get Kernel ST-BCNN. We conduct experiments on three datasets. By experiments, we get optimal degree of polynomial kernel and verify the advantages of our models over other classical methods.
关键词	细粒度图像分类深度学习空间变换网络双线性网络空间变换
语种	中文
七大方向——子方向分类	图像视频处理与分析
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/39087
专题	毕业生_硕士学位论文
推荐引用方式 GB/T 7714	王军鹏. 基于深度学习的大空间变换下细粒度图像分类算法研究[D]. 北京. 中国科学院大学,2020.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
基于深度学习的大空间变换下细粒度图像分类（5430KB）	学位论文		限制开放	CC BY-NC-SA