CASIA OpenIR  > 模式识别国家重点实验室  > 视频内容安全
深度卷积神经网络压缩与加速方法研究
阮晓峰
Subtype博士
Thesis Advisor胡卫明
2021-05-26
Degree Grantor中国科学院大学
Place of Conferral中国科学院自动化研究所
Degree Discipline模式识别与智能系统
Keyword深度卷积神经网络 模型压缩与加速 结构化剪枝 结构化稀疏 知识迁移
Abstract

        近年来,随着深度学习算法在人工智能领域(比如计算机视觉、语音识别和自然语言处理)的发展和应用,相关任务性能都有了突飞猛进的提高。然而,为了追求更高的性能,深度神经网络需要庞大的参数和繁重的计算量,当这些算法面对资源受限的嵌入式或者移动设备时,算力已经捉襟见肘,影响着深度学习算法在实际应用场景的部署,进而严重制约人工智能技术的落地。因此,深度神经网络压缩与加速研究对于深度学习进一步落地有着重要的学术价值与现实意义。

        本文围绕目前深度神经网络压缩与加速存在的若干问题,主要从压缩方法、压缩过程、特征学习3 个方面进行了系统地、深入地分析,同时设计了科学的、可行的解决方案,具体研究内容与创新点归纳如下:

        (1)提出了一种基于有效统一分解与剪枝的深度卷积神经网络压缩方法。剪枝改变了卷积层输入/输出的维度,导致一些特殊结构的最后一层无法压缩;低秩分解方法将一个卷积层分解为多个卷积层,增加了模型的实时内存消耗。针对这些问题,本文提出了一种基于有效统一分解与剪枝的深度卷积神经网络压缩方法。该方法通过构造一个可压缩的模块,即将一个卷积层分解为两层:新的基权值层和系数权值层。通过在系数权值上施加稀疏正则,提出的方法可以实现卷积层低秩和通道稀疏两个任务的联合优化。将单个卷积层可压缩构造模式拓展到整个网络,在数据驱动下该方法可以实现分解与剪枝的有效统一。此外,通过对压缩后模型中冗余的层进行合并操作,进一步减少了网络的冗余。在常用大、小规模的数据集和网络上,与多种主流方法比较,提出来的方法在压缩率、准确性、推理时间和实时内存消耗方面均获得了可竞争的结果。同时,在检测任务上也有很好的泛化性能。

        (2)提出了一种基于动态和渐进式稀疏正则的深度卷积神经网络剪枝方法。由于仅使用稀疏和剪枝无法获得令人满意的性能,现有的剪枝方法大多需要繁琐的过程,带来了额外的超参数和训练轮次。此外,有些工作没有考虑不同层间剪枝率的差异。针对这些问题,本文提出了一种基于动态和渐进式稀疏正则的深度卷积神经网络剪枝方法。该方法从头开始直接训练一个满足预设剪枝率的结构化稀疏网络。在训练过程中,不同层的稀疏分配率被动态地更新,同时使用基于泰勒级数的通道敏感性准则确定预期稀疏参数。进一步地,通过对预期稀疏参数施加渐进式变化的组稀疏正则,结构化稀疏网络被逐渐学习出来。提出来的方法利用基于动态稀疏的迭代阈值收缩算法解决了基于剪枝率的优化求解问题。在训练结束后,该方法直接移除冗余参数,即可获得满足预设剪枝率的压缩模型,整个过程不需要微调。实验结果表明,提出来的方法在大、小规模的数据集均取得了可竞争的结果。同时,在检测任务上也有很好的泛化性能。

        (3)提出了一种基于动态稀疏和特征学习增强的深度卷积神经网络剪枝方法。大多数剪枝方法仅仅利用训练好的模型参数作为训练的初始参数,而模型本身的特征表征信息没有被利用。针对这一问题,本文提出了一种基于模型特征学习增强训练的动态剪枝方法。在模型训练过程中,本文利用基准模型的特征表达信息和多个压缩子网络之间的类别信息来完成整个模型压缩任务,不需要任何数据标签信息监督。具体来说,从基准模型特征表达信息出发,本文利用基准模型(训练好的模型) 输出的预测类别信息和中间层特征作为监督信息指导压缩子模型的任务学习,增强了压缩模型学习基准模型特征的能力。从压缩子模型输出信息出发,本文利用不同压缩子模型的输出信息互相学习,增强了压缩子模型之间特征学习的能力。进一步地,结合提出的动态结构化稀疏正则方式,本文设计了无类别标签数据监督的端到端模型压缩框架。模型训练结束后,该方法直接移除冗余的参数,不需要任何数据类别标签信息进行微调。在多个网络结构和数据集下,提出的方法获得了很好的性能。

Other Abstract

In recent years, deep neural networks (DNNs) have been rapidly developed and applied in various fields of artificial intelligence (such as computer vision, speech recognition, and natural language processing), and the performance has been remarkably improved. However, in order to pursue higher performance, DNNs require huge parameters and heavy calculations. When these algorithms are run on resource-constrained embedded or mobile devices, the computing power is already stretched, which affects the real application of DNNs and seriously restricts the implementation of artificial intelligence technology. Therefore, the research of deep neural network compression and acceleration has important academic value and practical significance for the further implementation of deep learning.
In this thesis, we focus on several problems of current deep neural network compression and acceleration, mainly from the three aspects of compression methods, compression process, and feature learning. The specific research content and contributions are summarized as follows:
(1)An effective decomposition and pruning scheme for convolutional neural network compression is proposed. Channel pruning methods change the input-output dimensions of a layer, which may perform not well in some blocks, such as element-wise addition blocks. And, decompression methods fail to remove all the redundant channels and occupy much run-time memory. To overcome the above limitations, we propose an effective decomposition and pruning (EDP) scheme via constructing a compressed-aware block. Specifically, we embed the compressed-aware block by decomposing the original network layer into two layers: one is a basis weight matrix and the other is a coefficient matrix. By imposing the regularization on the coefficient matrix, the proposed method can realize the joint optimization of low rank and channel sparsity. Applying the compressed-aware block to the whole network, the proposed method achieves decomposition and pruning. Moreover, after pruning the network, the proposed method merges the redundant decomposed layers to further reduce the redundancy. Experiments on several datasets and a range of network architectures show the proposed method achieves a high compression ratio with acceptable accuracy degradation and is competitive with state-of-the-arts on compression rate, accuracy, inference time, and run-time memory. In detection tasks, it also has good generalization performance.
(2)A novel dynamic and progressive filter pruning scheme for compressing convolutional neural networks is proposed. Existing pruning methods mostly need a cumbersome procedure, which brings many extra hyper-parameters and training epochs. This is because only using sparsity and pruning stages cannot obtain a satisfying performance. Besides, many works do not consider the difference of pruning ratio across different layers. To overcome these limitations, we propose a novel dynamic and progressive filter pruning (DPFPS) scheme that directly learns a structured sparsity network from scratch. In particular, DPFPS imposes a new structured sparsity-inducing regularization specifically upon the expected pruning parameters in a dynamic sparsity manner. The dynamic sparsity scheme determines sparsity allocation ratios of different layers and a Taylor series based channel sensitivity criteria is presented to identify the expected pruning parameters. Moreover, we increase the structured sparsity-inducing penalty in a progressive manner. Our method solves the pruning ratio based optimization problem by an iterative shrinkage-thresholding algorithm (ISTA) with dynamic sparsity. At the end of the training, we only need to remove the redundant parameters without other stages, such as fine-tuning. Extensive experimental results show that the proposed method is competitive with state-of-the-art methods on both small-scale and large-scale datasets. In detection tasks, it also has good generalization performance.
(3)A dynamic sparsity and model feature learning enhanced training for convolutional neural network pruning is proposed. Most of the existing pruning methods simply use the well-trained model to initialize parameters, without considering its feature representation. To this end, we propose a label-free and dynamic pruning method based on model feature learning enhanced training. During training, we use the expression information of the well-trained model and different sub-models (compression models) to deploy the entire model compression task. On the one hand, we use the category level information and features of intermediate layers (well-trained model) to guide the task learning of the compression models, which enhances the ability of the compression models to learn the features of the well-trained model; on the other hand, we use the output information of different sub-models (compression models) to learn from each other, which promotes the feature learning ability between different sub-models. Moreover, combined with the proposed dynamic structured sparsity regularization method, we design an end-to-end and label-free model compression framework. At the end of the training, we only need to remove the redundant parameters without other stages, such as fine-tuning. Extensive experimental results show that the proposed method achieves good compression performance on multiple datasets and networks.

Pages128
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/44800
Collection模式识别国家重点实验室_视频内容安全
Recommended Citation
GB/T 7714
阮晓峰. 深度卷积神经网络压缩与加速方法研究[D]. 中国科学院自动化研究所. 中国科学院大学,2021.
Files in This Item:
File Name/Size DocType Version Access License
Thesis.pdf(23075KB)学位论文 开放获取CC BY-NC-SA
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[阮晓峰]'s Articles
Baidu academic
Similar articles in Baidu academic
[阮晓峰]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[阮晓峰]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.