CASIA OpenIR  > 脑图谱与类脑智能实验室  > 脑网络组研究
Zhang Jinpeng







4)基于CNN特征图的激活特性对图像目标检测算法进行了改进,提出了Hot Anchors算法。该方法利用特征图上每个像素点的激活值作为判别依据,将目标检测算法中锚点框(Anchor-Boxes)的生成过程由均匀采样改进为基于CNN激活值的启发式采样,从而可减小锚点采样规模,提升算法计算效率,并提升检测精度。



In recent years, computer vision based on deep convolution neural networks (CNNs) has made great progress. Especially in image classification tasks, new design ideas and methods are emerging, which make model architecture go through a fast iterative process with classification accuracy and efficiency greatly improved. In CNN classification models, both the feature extractor and the classifier are constructed with artificial neurons as the basic unit, so the unified process of feature forward propagation and gradient back-propagation can be adopted, which makes the feature extraction and classification an integration process. As a result, the whole network has the ability of end-to-end learning. In the process of learning, the optimization of feature extractor and classifier is driven by input data and classification loss, without too much human interventions. In the feature extractor, the cascaded convolution layers are used to improve the semantic expression of feature-maps by adjusting the learnable parameters, which provides a high-quality feature space for subsequent classifiers. The classifier can also fit the decision function by adjusting its learnable parameters. In CNN models, ensuring the effective feature forward propagation and error back-propagation can make the deep model less prone to underfit and easier to train. At the same time, the regularization methods in machine learning, such as $l_2$ regularization, can prevent CNN models from over fitting, so as to improve its generalization.

However, there are four outstanding problems in the current CNN-based classification models. Firstly, the mechanisms of CNN feature extraction process are not fully understood, and the core logic of its operation is still not clearly described, redering its working process still a black box. Secondly, the current CNN classification models reduce the feature dimensions by reducing the resolution of feature-maps stage by stage and increasing the number of channels at the same time. Therefore, the high-resolution feature-maps are always in the shallow layer of a model and can not be associated with high-level semantic information, while the high-resolution feature-maps are very important in other visual tasks such as object detection. Thirdly, the mainstream design scheme is based on the residual network. However, there are still some problems in the residual network, such as the weak ability of gradient back-propagation, which lead to the difficulty of the optimization of very deep network due to the gradient vanishment. Fourthly, when CNN classification models are used as backbone networks for other visual tasks, the core advantages of CNN features are not fully exploited and utilized.

Therefore, to address the above problems, this thesis does the following researches:

1)Based on the Bayes theory and the KL divergence, we evaluate the CNN feature map and analyse the mechanisms and function of its main operation units. Experiments show that CNN improves the distinctiveness and robustness of features by gradually increasing the KL divergence between classes and reducing the KL divergence within individual classes. The experiment also reveals the function of network width and depth, that is, the separability information density on the feature component tends to be saturated with the increase of network width, and gradually increases with the network deepening. They cooperate with each other, so that the feature extractor can achieve efficient compression of semantic information.

2)Inspired by the information flow mechanisms in biological visual cortex, a CNN image classification model based on a multi-scale process is constructed, named ScaleNet. At present, most of the classification models reduce the resolution of feature-maps stage by stage, so as to achieve feature reduction and semantic information extraction. ScaleNet can achieve multi-scale feature extraction at any depth of a network, and can maintain high resolution feature-maps in very deep layers of a network. For example, ScaleNet can provide feature maps with a high-resolution of 32x32 for the terminal classifier on CIFAR datasets. This design enables the high-resolution feature-maps in the deep layers of ScaleNet to learn strong semantic expression and at the same time to capture the fine-grained visual features.

3)Inspired by ResNet and DenseNet, a multipath skip-connections structure is designed. The structure can be combined with residual learning to form a multipath residual structure. Compared with the single-path input/output of the original residual module, the proposed residual module has three-path input/output, and each path can form a residual connection across multiple layers, so it can effectively improve the gradient back-propagation ability of th network. The experimental analyses demonstrate that ScaleNet equiped with this structure can achieve better classfication performance than the original single-path residual structure.

4)Inspired by the activation characteristics of CNN feature-maps, we propose an imporved algorithm named Hot Anchors for image object detection. In this method, the activation value of each pixel on the feature-maps is used to recognize the proper pixels to place anchor boxes, which improves the generation of anchor boxes in object detection algorithms from uniform sampling to heuristic sampling. As a result, the sampling number of anchor boxes is largely decreased, so that the calculation cost of the algorithm is reduced and the detection accuracy is improved.

In summary, the above researches involve the analyses of CNN feature extraction mechanisms, the design improvements of CNN classification models, and the reuse of CNN classification models in other visual tasks. The researches on the mechanisms of image feature extraction can help develop better feature extraction methods, and then can help design better image classification models, and the better utilization of classification models is helpful to improve the performance of other visual tasks. Therefore, the research contents of these three aspects are progressive in logic, which are important progress to the basic mechanisms, design ideas and application methods of CNN classification models, and are of great significance to improve the current image feature extraction and classification researches.

GB/T 7714
Zhang Jinpeng. 基于深度卷积网络的有监督图像特征提取和分类研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2019.
文件名称/大小 文献类型 版本类型 开放类型 使用许可
基于深度卷积网络的有监督图像特征提取和分(10324KB)学位论文 开放获取CC BY-NC-SA
[Zhang Jinpeng]的文章
[Zhang Jinpeng]的文章
[Zhang Jinpeng]的文章
所有评论 (0)
