In recent years, deep neural networks (DNNs) have been rapidly developed and applied in various fields of artificial intelligence (such as computer vision, speech recognition, and natural language processing), and the performance has been remarkably improved. However, in order to pursue higher performance, DNNs require huge parameters and heavy calculations. When these algorithms are run on resource-constrained embedded or mobile devices, the computing power is already stretched, which affects the real application of DNNs and seriously restricts the implementation of artificial intelligence technology. Therefore, the research of deep neural network compression and acceleration has important academic value and practical significance for the further implementation of deep learning.
In this thesis, we focus on several problems of current deep neural network compression and acceleration, mainly from the three aspects of compression methods, compression process, and feature learning. The specific research content and contributions are summarized as follows:
（1）An effective decomposition and pruning scheme for convolutional neural network compression is proposed. Channel pruning methods change the input-output dimensions of a layer, which may perform not well in some blocks, such as element-wise addition blocks. And, decompression methods fail to remove all the redundant channels and occupy much run-time memory. To overcome the above limitations, we propose an effective decomposition and pruning (EDP) scheme via constructing a compressed-aware block. Specifically, we embed the compressed-aware block by decomposing the original network layer into two layers: one is a basis weight matrix and the other is a coefficient matrix. By imposing the regularization on the coefficient matrix, the proposed method can realize the joint optimization of low rank and channel sparsity. Applying the compressed-aware block to the whole network, the proposed method achieves decomposition and pruning. Moreover, after pruning the network, the proposed method merges the redundant decomposed layers to further reduce the redundancy. Experiments on several datasets and a range of network architectures show the proposed method achieves a high compression ratio with acceptable accuracy degradation and is competitive with state-of-the-arts on compression rate, accuracy, inference time, and run-time memory. In detection tasks, it also has good generalization performance.
（2）A novel dynamic and progressive filter pruning scheme for compressing convolutional neural networks is proposed. Existing pruning methods mostly need a cumbersome procedure, which brings many extra hyper-parameters and training epochs. This is because only using sparsity and pruning stages cannot obtain a satisfying performance. Besides, many works do not consider the difference of pruning ratio across different layers. To overcome these limitations, we propose a novel dynamic and progressive filter pruning (DPFPS) scheme that directly learns a structured sparsity network from scratch. In particular, DPFPS imposes a new structured sparsity-inducing regularization specifically upon the expected pruning parameters in a dynamic sparsity manner. The dynamic sparsity scheme determines sparsity allocation ratios of different layers and a Taylor series based channel sensitivity criteria is presented to identify the expected pruning parameters. Moreover, we increase the structured sparsity-inducing penalty in a progressive manner. Our method solves the pruning ratio based optimization problem by an iterative shrinkage-thresholding algorithm (ISTA) with dynamic sparsity. At the end of the training, we only need to remove the redundant parameters without other stages, such as fine-tuning. Extensive experimental results show that the proposed method is competitive with state-of-the-art methods on both small-scale and large-scale datasets. In detection tasks, it also has good generalization performance.
（3）A dynamic sparsity and model feature learning enhanced training for convolutional neural network pruning is proposed. Most of the existing pruning methods simply use the well-trained model to initialize parameters, without considering its feature representation. To this end, we propose a label-free and dynamic pruning method based on model feature learning enhanced training. During training, we use the expression information of the well-trained model and different sub-models (compression models) to deploy the entire model compression task. On the one hand, we use the category level information and features of intermediate layers (well-trained model) to guide the task learning of the compression models, which enhances the ability of the compression models to learn the features of the well-trained model; on the other hand, we use the output information of different sub-models (compression models) to learn from each other, which promotes the feature learning ability between different sub-models. Moreover, combined with the proposed dynamic structured sparsity regularization method, we design an end-to-end and label-free model compression framework. At the end of the training, we only need to remove the redundant parameters without other stages, such as fine-tuning. Extensive experimental results show that the proposed method achieves good compression performance on multiple datasets and networks.