CASIA OpenIR  > 脑网络组研究中心
基于深度卷积网络的有监督图像特征提取和分类研究
Zhang Jinpeng
Subtype博士
Thesis Advisor余山
2019-12-03
Degree Grantor中国科学院自动化研究所
Place of Conferral中国科学院自动化研究所
Degree Name工学博士
Degree Discipline模式识别与智能系统
Keyword特征提取,图像分类,目标检测,卷积神经网络
Abstract

近年来,基于深度卷积神经网络(CNN)的机器视觉技术取得重大进展,特别是在图像分类任务方面,新的设计思路和方法不断涌现,使得模型架构快速迭代,分类精度和分类效率都得到极大提升。在CNN分类模型中,特征提取器和末端分类器都以神经元为基本单元进行构建,因而可采用统一的特征前传和梯度反传过程,使得特征提取和分类实现了过程整合,从而使整个网络可进行端到端的学习。在学习过程中,特征提取器和分类器的调优完全由输入数据和分类损失加以驱动,而无需过多的人工干预。特征提取器中逐层级联的卷积核通过调节可学习参数来逐渐提升特征图的语义表达,特别是其深层特征具有极高的抽象描述能力,从而为后续分类器提供了可分性良好的特征空间。而分类器亦能通过调节可学习参数来拟合出分类决策函数。在CNN分类模型中,确保有效的特征前传和梯度反传才能使得深层模型不易发生欠拟合,从而使其更容易训练。同时,配合机器学习中的正则化方法,可防止模型发生过拟合,以提升其泛化性能。

但当前基于CNN的分类模型也存在以下四个较为突出的问题。其一,对CNN特征提取过程的原理性认识不足,目前仍未能完全阐明其核心工作逻辑,使得其工作过程仍然呈现出一种黑箱状态。其二,当前的CNN分类模型都以逐阶段的降低特征图分辨率、并同时提升其通道数的方式来实现特征降维,因而使得高分辨率特征图始终处于模型浅层而无法获得足够高的语义表达,而高分辨率特征图在诸如目标检测等其他视觉任务中非常重要。其三,当前以残差网络为基本架构的设计方案占据了主流,但残差网络仍然存在特征前传能力不足,梯度反传能力薄弱的现象,从而导致极深层网络因梯度弥散而优化困难。其四,在将CNN分类模型用作骨架网络来完成其他视觉任务时,未能充分挖掘和利用CNN特征“抑制背景噪声信息,凸显前景物体信息”的核心优势。

因此,基于对上述问题的思考,本论文做了以下研究:

1)基于Bayes理论和KL散度,对CNN特征图进行了评估,并分析了其主要功能单元的机制和作用。实验发现,CNN通过在训练过程中逐渐提高各层特征图的类间KL散度,并同时降低其类内KL散度,来改善特征独特性和鲁棒性。实验还揭示了网络宽度和深度在特征提取中的作用,即特征分量上的可分性信息密度随网络宽度增加而趋向饱和,随网络深度增加而逐渐提高,两者相互协同,使得特征提取器能够实现可分性信息的高效压缩和提取。

2)受生物视觉皮层中信息流机制的启发,构造了一种基于解构重构过程的多尺度CNN图像分类模型ScaleNet。当前的绝大部分分类模型都以逐阶段的降低特征图分辨率为代价,从而实现特征降维和语义信息提取,而ScaleNet能够在网络的任意深度上实现多尺度特征提取,能够在网络极深层仍然维持特征图的高分辨率。例如,可为CIFAR数据集的末端分类器提供32x32的超高分辨率特征图。这一结构特点,使得ScaleNet中位于网络极深层的高分辨率特征图也能学习到强语义表达,从而使其能够捕捉到图像的细粒度特征。

3)受ResNet和DenseNet等模型的启发,设计了一种多通路跨层连接结构。该结构与残差学习相结合,可形成多通路残差连接结构。相比原始残差模块的单路输入/输出,改进后的残差模块有三路输入/输出,且每一路都形成一个跨越多层的残差连接,因而能更有效的提升网络的梯度反传能力。基于ScaleNet的实验分析表明,多通路残差结构在分类准确率上优于原始的单通路残差结构。

4)基于CNN特征图的激活特性对图像目标检测算法进行了改进,提出了Hot Anchors算法。该方法利用特征图上每个像素点的激活值作为判别依据,将目标检测算法中锚点框(Anchor-Boxes)的生成过程由均匀采样改进为基于CNN激活值的启发式采样,从而可减小锚点采样规模,提升算法计算效率,并提升检测精度。

以上研究课题涉及了CNN特征提取原理的分析,CNN分类模型的设计优化,以及CNN分类模型在其他视觉任务中的再利用这三个方面的内容。对图像特征提取原理的研究有助于开发更好的特征提取方法,进而有助于设计更好的图像分类模型,而对分类模型的深度利用又有助于提升其他视觉任务的性能。这三个方面的研究内容在逻辑上依次递进,是对CNN分类模型基本原理,设计思路,以及应用方法的重要补充,对推动当前图像特征提取和分类研究具有重要意义。

Other Abstract

In recent years, computer vision based on deep convolution neural networks (CNNs) has made great progress. Especially in image classification tasks, new design ideas and methods are emerging, which make model architecture go through a fast iterative process with classification accuracy and efficiency greatly improved. In CNN classification models, both the feature extractor and the classifier are constructed with artificial neurons as the basic unit, so the unified process of feature forward propagation and gradient back-propagation can be adopted, which makes the feature extraction and classification an integration process. As a result, the whole network has the ability of end-to-end learning. In the process of learning, the optimization of feature extractor and classifier is driven by input data and classification loss, without too much human interventions. In the feature extractor, the cascaded convolution layers are used to improve the semantic expression of feature-maps by adjusting the learnable parameters, which provides a high-quality feature space for subsequent classifiers. The classifier can also fit the decision function by adjusting its learnable parameters. In CNN models, ensuring the effective feature forward propagation and error back-propagation can make the deep model less prone to underfit and easier to train. At the same time, the regularization methods in machine learning, such as $l_2$ regularization, can prevent CNN models from over fitting, so as to improve its generalization.

However, there are four outstanding problems in the current CNN-based classification models. Firstly, the mechanisms of CNN feature extraction process are not fully understood, and the core logic of its operation is still not clearly described, redering its working process still a black box. Secondly, the current CNN classification models reduce the feature dimensions by reducing the resolution of feature-maps stage by stage and increasing the number of channels at the same time. Therefore, the high-resolution feature-maps are always in the shallow layer of a model and can not be associated with high-level semantic information, while the high-resolution feature-maps are very important in other visual tasks such as object detection. Thirdly, the mainstream design scheme is based on the residual network. However, there are still some problems in the residual network, such as the weak ability of gradient back-propagation, which lead to the difficulty of the optimization of very deep network due to the gradient vanishment. Fourthly, when CNN classification models are used as backbone networks for other visual tasks, the core advantages of CNN features are not fully exploited and utilized.

Therefore, to address the above problems, this thesis does the following researches:

1)Based on the Bayes theory and the KL divergence, we evaluate the CNN feature map and analyse the mechanisms and function of its main operation units. Experiments show that CNN improves the distinctiveness and robustness of features by gradually increasing the KL divergence between classes and reducing the KL divergence within individual classes. The experiment also reveals the function of network width and depth, that is, the separability information density on the feature component tends to be saturated with the increase of network width, and gradually increases with the network deepening. They cooperate with each other, so that the feature extractor can achieve efficient compression of semantic information.

2)Inspired by the information flow mechanisms in biological visual cortex, a CNN image classification model based on a multi-scale process is constructed, named ScaleNet. At present, most of the classification models reduce the resolution of feature-maps stage by stage, so as to achieve feature reduction and semantic information extraction. ScaleNet can achieve multi-scale feature extraction at any depth of a network, and can maintain high resolution feature-maps in very deep layers of a network. For example, ScaleNet can provide feature maps with a high-resolution of 32x32 for the terminal classifier on CIFAR datasets. This design enables the high-resolution feature-maps in the deep layers of ScaleNet to learn strong semantic expression and at the same time to capture the fine-grained visual features.

3)Inspired by ResNet and DenseNet, a multipath skip-connections structure is designed. The structure can be combined with residual learning to form a multipath residual structure. Compared with the single-path input/output of the original residual module, the proposed residual module has three-path input/output, and each path can form a residual connection across multiple layers, so it can effectively improve the gradient back-propagation ability of th network. The experimental analyses demonstrate that ScaleNet equiped with this structure can achieve better classfication performance than the original single-path residual structure.

4)Inspired by the activation characteristics of CNN feature-maps, we propose an imporved algorithm named Hot Anchors for image object detection. In this method, the activation value of each pixel on the feature-maps is used to recognize the proper pixels to place anchor boxes, which improves the generation of anchor boxes in object detection algorithms from uniform sampling to heuristic sampling. As a result, the sampling number of anchor boxes is largely decreased, so that the calculation cost of the algorithm is reduced and the detection accuracy is improved.

In summary, the above researches involve the analyses of CNN feature extraction mechanisms, the design improvements of CNN classification models, and the reuse of CNN classification models in other visual tasks. The researches on the mechanisms of image feature extraction can help develop better feature extraction methods, and then can help design better image classification models, and the better utilization of classification models is helpful to improve the performance of other visual tasks. Therefore, the research contents of these three aspects are progressive in logic, which are important progress to the basic mechanisms, design ideas and application methods of CNN classification models, and are of great significance to improve the current image feature extraction and classification researches.

Pages91
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/28344
Collection脑网络组研究中心
Recommended Citation
GB/T 7714
Zhang Jinpeng. 基于深度卷积网络的有监督图像特征提取和分类研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2019.
Files in This Item:
File Name/Size DocType Version Access License
基于深度卷积网络的有监督图像特征提取和分(10324KB)学位论文 开放获取CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Zhang Jinpeng]'s Articles
Baidu academic
Similar articles in Baidu academic
[Zhang Jinpeng]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Zhang Jinpeng]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.