基于注意力机制与特征融合的细粒度图像分类算法研究

CASIA OpenIR > 多模态人工智能系统全国重点实验室 > 机器人理论与应用

	基于注意力机制与特征融合的细粒度图像分类算法研究
	钱扬
	2019-05
页数	77
学位类型	硕士
中文摘要	随着深度学习的发展和应用，计算机视觉领域的图像分类任务取得了重大的突破，图像分类中的一大分支——细粒度图像分类也取得了一定的进展。与传统图像分类不同，细粒度图像分类任务致力于解决一个父类下的多个子类的分类问题，在生态保护、车辆自动识别、军事自动化等方面有着广泛的应用。由于每个类别同属于一个父类，各个子类之间的差别非常细微。同时由于物体姿态和环境光照各异以及存在遮挡，每个子类中的各个样本之间的差别非常显著，使得细粒度图像分类任务面临着类间方差小、类内方差大的挑战。本论文首先从细粒度图像分类任务的特点出发，分析已有方法的优缺点，结合多注意力机制和特征融合方法，提出改进算法，提升模型在细粒度图像分类任务上的性能。论文的主要工作和创新点归纳如下： 1. 提出了一种面向模型理解与改进的神经网络可视化算法深度神经网络在计算机视觉领域取得了很好的成果，各种各样的神经网络层出不穷。然而，深度神经网络内部的工作机制依然很不明确。关于神经网络的可解释性和可视化方法研究逐渐成为研究人员关注的焦点。本文从可视化卷积神经网络卷积核角度出发，提出了一种卷积核敏感区域生成网络，用以发现卷积核的响应与输入图像的局部区域的对应关系，即输入图像的哪些区域对特征图响应的影响最大。与其他卷积核可视化方法不同的是，本文采用了一个上采样网络，分别应用区域保留策略和区域遮挡策略，可以生成任意形状、任意大小的敏感区域。该方法为神经网络可视化提供了一种新的思路，便于分析卷积核在训练过程是否有效地记住了数据集的关键特征，为后续网络的改进提供了指导方向。同时，在细粒度图像数据集FGVC-Aircraft进行实验，证明了可视化算法的有效性。 2. 设计了一种基于多注意力机制和空间关系建模的细粒度图像分类算法细粒度图像分类数据集中各个类别的差异主要在于物体的各个组成部件的差异。传统的细粒度图像分类算法主要从部件检测出发，自下而上地生成大量的部件候选区域，这种方法计算复杂度和冗余度都比较高。本文受视觉注意力机制的启发，对多注意力机制进行建模，自上而下地去定位物体的多个关键区域，精简了网络结构，减小了计算量。同时，针对现有算法在空间关系建模方面的缺失，在检测部件的同时，采用多种方法对各个部件之间的空间关系进行建模，提高部件特征的完整性和判别性。最后，分别在四个细粒度图像分类数据集上进行实验，分析了多注意机制对部件定位的有效性，证明了空间位置特征的引入对分类性能的提升有促进作用。 3. 设计了一种基于深层特征指导的特征融合算法细粒度图像分类数据集中每个类别的物体的各个部件之间的差异非常细微。因此，在细粒度图像分类过程中，不仅需要定位物体的各个部件，还需要准确提取部件的细节特征。由于卷积神经网络的堆叠式层级结构，深层特征主要保留了语义信息，而损失了大量的细节信息，不利于细粒度图像的分类。本文结合深层特征和浅层特征的特点，提出了一种基于深层特征指导的特征融合方法。利用深层特征对部件进行粗定位，采用空间变换网络对浅层特征图上部件区域的特征进行仿射变换，得到部件的细节特征，用于图像分类。同样地，分别在四个细粒度图像分类数据集上进行多种实验，证明该特征融合方法能够有效提升分类准确率。最后，基于以上提出的方法，我们设计和实现了一个细粒度图像分类算法集成的视觉平台，并加入可视化模块，便于直观地进行算法的比较与调试。
英文摘要	With the development and application of deep learning, significant breakthroughs have been made in image classification task in computer vision. Meantime, a major branch, fine-grained image classification, has also seen big progress. Different from traditional image classification, fine-grained image classification mainly focuses on the recognition of sub-categories. Because these fine-grained categories have the same parent, there are only subtle differences between them. Also, due to the diversity of object pose, illumination condition and the existence of occlusion, the differences among objects that belong to the same sub-category are noticeable. Therefore, small inter-class variance and high intra-class variance are the main challenge for fine-grained image classification. Based on the characteristics above, this paper firstly analyses the advantages and disadvantages of current algorithms. Combining multi-attention mechanism and feature fusion, an improved algorithm is proposed and achieves a better performance on fine-grained image classification. The main contributions are summarized as follows: 1. A Novel Convolutional Neural Network (CNN) Visualization Algorithm for Model Understanding and Improvement It is proved that CNN has gained an incomparable performance in computer vision, while a variety of CNNs are proposed for different tasks. However, it is still uncertain why CNNs work so excellently. Research on the interpretability and visualization of CNNs has gradually become a hotspot. A part of this paper concentrates on the visualization of convolution kernel and proposes a Filter Sensitive Area Generation Network (FSAGN), which is used to find out which part of input image contributes the most to the response of filter. Different from other visualization algorithms, the proposed method designs an up-convolutional network to generate sensitive regions of arbitrary shape and size. Adopting sensitive area reservation and occlusion strategy respectively, FSAGN finally obtains the power to localize the key part that certain filter represents. This algorithm helps researchers to analyse whether convolutional filters remember the key features of datasets during training, which provides guidance for the following improvement of network. Meantime, experiments on FGVC-Aircraft dataset proves the effectiveness of the visualization algorithm. 2. An Algorithm for Fine-Grained Image Classification Based on Multi-Attention and Spatial Relation Modeling The difference between each category in fine-grained image classification dataset mainly lies in the difference of the various components of the object. Based on part detection, most traditional classification algorithms generate a large number of region proposals under a bottom-up style. This kind of method has high computational complexity and redundancy. Inspired by the visual attention mechanism, this paper is aimed at modeling the multi-attention mechanism and locating multiple indispensable parts of the object using a top-down style, which is helpful for the simplification of network structure and reduction of computation. At the same time, considering the lack of spatial relation modeling in current algorithms, a variety of methods are proposed to model the spatial relation between multiple parts of object, so as to improve the discrimination of features. Finally, experiments on four fine-grained image classification datasets show the effectiveness of multi-attention model for part localization and the classification performance improvement after introducing spatial features. 3. A New Feature Fusion Method Based on Guidance of Features in High-Layer In the process of fine-grained image classification, it is not only necessary to locate the various components of the object, but also need to accurately extract the subtle difference of the components. Because of the stacked hierarchical structure of CNN, features in high-layer mainly retain semantic information, while losing detailed geometrical information, which is not conducive to the fine-grained image classification. Making use of advantages of features in high-layer and low-layer, a feature fusion method based on guidance of features in high-layer is proposed. The features in high-layer are used to locate object parts roughly, and then a spatial transformation network is adopted to affine the features in low-layer under the guidance of part location and output the detail features of object components. Similarly, experiments on fine-grained image datasets show the improvement of classification accuracy. Finally, a vision platform that integrates multiple fine-grained image classification algorithms is designed and implemented, which helps the verification, comparison and debugging of algorithms intuitively.
关键词	细粒度图像分类注意力特征融合深度学习
语种	中文
资助项目	National Natural Science Foundation of China[91648205] ; National Natural Science Foundation of China[91648205]
七大方向——子方向分类	图像视频处理与分析
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/23864
专题	多模态人工智能系统全国重点实验室_机器人理论与应用
推荐引用方式 GB/T 7714	钱扬. 基于注意力机制与特征融合的细粒度图像分类算法研究[D]. 中国科学院自动化研究所智能化大厦1层. 中国科学院自动化研究所,2019.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
毕业论文-钱扬.pdf（3715KB）	学位论文		开放获取	CC BY-NC-SA