深度卷积神经网络轻量化方法研究

CASIA OpenIR > 毕业生 > 博士学位论文

	深度卷积神经网络轻量化方法研究
	胡一鸣
	2021-05-24
页数	148
学位类型	博士
中文摘要	近年来，深度神经网络在计算机视觉、语音识别以及自然语言处理等多个领域取得了巨大成功。尤其是深度卷积神经网络，大幅提升了图像分类、目标检测、语义分割等视觉任务的性能，在某些任务上已经达到甚至超越了人类水平。然而，随着模型性能的不断提升，网络结构却越来越复杂，同时参数量和计算量也随之增加。过高的复杂度严重阻碍了模型在一些低功耗、低存储的设备上部署，如智能手机、嵌入式芯片等。因此，研究如何在保证性能的同时实现深度模型的轻量化，对推动深度卷积神经网络在移动设备上的大规模应用，具有重要意义。本文针对这一研究课题，首先从卷积神经网络通道剪枝、低比特量化等方法入手，人工地压缩和加速深度卷积神经网络，并在此基础上对自动化的轻量卷积神经网络结构设计方法展开了深入研究。具体研究内容和主要创新点归纳如下：（1）提出了一种基于多损失感知的卷积神经网络通道剪枝方法。通道剪枝的关键在于如何准确评估卷积神经网络中不同通道的重要性，现有方法根据通道的模长、梯度大小或对应特征的重构误差等指标进行通道选择，然而并没有充分挖掘原模型的有用信息，可能错误地删除掉一些重要的通道。为了从庞大的参数空间中准确找到冗余的卷积通道，本文设计了一种多损失融合的通道选择策略。首先，引入全局的分类误差监督不同层的通道选择，以提高所选通道的类别感知能力。其次，通过最小化剪枝前和剪枝后模型特征间的最大平均距离（Maximum Mean Discrepancy, MMD）来对齐两者的特征分布，并使用同样的方法对齐它们之间的语义分布，以保留原模型的特征和语义信息。在公开数据集上的实验结果表明，基于本方法得到的轻量级模型的精度和压缩率都是已知方法里最高的。（2）提出了一种基于聚类正则化的卷积神经网络低比特量化方法。尽管低比特神经网络（二值或三值）配合硬件实现能达到非常高的计算效率，但由于缺乏有效的训练手段，低比特模型的性能在大规模数据集上严重下降。本文提出了一种带有聚类约束项的模型训练方法，在训练期间将约束项加到目标损失函数中，通过交替优化模型的权重、权值缩放系数和聚类指示矩阵，以鼓励浮点参数在预定义的聚类中心附近自然聚集。训练结束后各层权重分别被量化到距离最近的聚类中心，以保证量化前的浮点参数和量化后的低比特参数之间有更小的间隔，从而实现浮点参数向低比特参数的平滑过渡。此外，整个优化过程只引入了一个正则项，并没有增加额外的计算成本。大量的实验结果表明所提方法能够有效降低低比特神经网络的量化误差，并在已知三值量化方法中取得了最高性能。（3）提出了一种基于超网络参数解耦的轻量级卷积神经网络结构自动搜索方法。由于算法效率和灵活性很高，基于超网络的One-Shot 方法经常用于轻量级的网络结构搜索。在网络结构搜索任务中，如何准确且高效地评估模型的性能是一项挑战。基于超网络的模型性能估计方法效率很高，但超网络参数的耦合导致模型性能估计的准确性和稳定性较差。因此，本文提出了一种针对超网络参数的解耦方法。通过不断减少搜索空间中候选操作的数量，同时增加超网络的参数，实现不同结构间参数的解耦。实验表明，本文提出的方法有效降低了超网络参数的耦合程度，显著提高了模型性能估计的准确性和稳定性。在Flops 和时延两种不同的搜索约束下，所提方法均能找到性能更强的轻量级卷积神经网络结构。（4）提出了一种基于角度的轻量级卷积神经网络结构搜索空间裁剪方法。指数级的搜索空间规模给搜索算法和模型性能估计带来了巨大挑战，针对One-Shot方法中存在的模型性能估计问题，前文已经从单一算法改进的角度提出了超网络参数解耦方法。本方法从搜索空间的角度重新思考轻量级网络结构搜索面临的挑战，致力于从根本上缓解现有方法在大规模搜索空间中存在的问题。本文首先提出了一种基于角度的指标，通过计算权重间的夹角近似估计模型的性能，所提指标相比于现有指标效率更高、准确性和稳定性更好。基于角度指标，进一步提出了一种通用的搜索空间裁剪方法，通过动态地裁剪搜索空间中冗余的候选操作，最终得到一个规模更小且质量更好的搜索空间。该空间不依赖特定的搜索算法，因此能够降低不同模型搜索方法的搜索难度。实验表明，SPOS、DARTS等方法均能在本文方法裁剪后的搜索空间中找到精度更高的轻量级模型。
英文摘要	In recent years, deep neural networks (DNNs) have achieved great success in many feilds such as computer vision, speech recognition, and natural language processing. In particular, deep convolutional neural networks (CNNs) have significantly improved the performance of vision tasks including image classification, object detection, semantic segmentation, etc., surpassing humans in some tasks. However, as performance of DNNs continues to increase, network structure is becoming more and more complicated, and parameters and Flops of models have been also growing rapidly. High model comlexity severely hinders such models from deployment on low-power and low-memory devices, such as smartphones, embedded chips, etc. Therefore, researching on how to design lightweight CNNs with high performance is very important for promoting the deployment of deep CNNs on mobile terminals. In this dissertation, we begin with channel pruning and low-bit quantization, for compressing and accelerating deep CNNs manually. Based on these works, we study into automatic design of lightweight architectures for deep CNNs. The specific research contents and contributions of this dissertation are summarized as follows: （1）Firstly, we present a multi-loss-aware channel pruning method for deep CNNs. For channel pruning, how to accurately estimate the importance of different channels matters. Existing methods select channels according to the norm of channels, the magnitude of gradients or the reconstruction error of corresponding features, but they don’t deeply excavate useful information contained in the original model, which may mistakenly remove some important channels. To discover redundant channels accurately from a large number of parameters, this dissertation designs the strategy of channel selection based on the integration of multiple losses. First, we introduce the global classification error to supervise channel selection for different layers, increasing the discriminative power of selected channels. Second, we seek to align the feature distribution and semantic distribution between the original model and the pruned one, preserving the feature and semantic information of the original model. Experimental results on public datasets showing that, the lightweight models discovered by the proposed method achieve the state-of-the-art performance and compression rate. （2）Secondly, we present a cluster regularized low-bit quantization method for deep CNNs. Low-bit models (binary or ternary) are able to achieve efficient inference on hardware, but they have large performance degradation on large-scale datasets owing to the lack of effective training methods. This dissertation presents a new training method with clustering constraint. During training stage, the regularization term added to the objective loss function encourages float weights to concentrate natually around the predefined cluster centers by alternative optimization among model weights, scaling factors and the matrix for cluster indicator. After training, weights of each layer are quantized to the closest cluster certers respectively, which makes sure that the gap between the float weights and the low-bit ones is smaller, so that float parameters achieve smooth transition to the quantized ones. Moreover, the whole procedure of optimization only introduces a regularization term, which doesn’t increase additional computation burden. Experimental results showing that, the proposed method reduces the quantization error effectively and achieves the state-of-the-art performance. （3）Thirdly, we present an architecture search method for lightweight deep CNNs based on the decouple of supernet parameters. The one-shot method based on supernet is widely used to search for the lightweight architectures due to its high efficiency and flexibility. In the area of neural architecture search, how to accurately and efficiently estimate the performance of a model remains a challenge. One-Shot method based on supernet is very efficient when estimating the performance of a model, but it suffers terrible accuracy and stability due to parameter couple of supernet. Therefore, this dissertation presents a decouple method for supernet parameters. The proposed method continues to reduce the number of candidate operators and increase supernet parameters, which can reduce the degree of parameters decouple among different child models. Experimental results showing that, the proposed method effectively decreases the degree of parameters couple, making performance estimation more accurate and more stable. Under Flops and latency constraints, the proposed method is able to discover the lightweight CNNs with the higher performance. （4）Finally, we present an angle-based search space shrinking method for lightweight architecture search of deep CNNs. The search space with exponential size brings large challenges to search algorithms and performance estimation. For the problem of model performance estimation, the method for decoupling supernet parameters has been presented from the view of improving an existing algorithm. Here, we reconsider the challenge when searching a lightweight network structure from the search space standpoint, devoting to fundamentally relieving the issues of seaching over large-scale search spaces. First, we present an angle-based metric for estimating the performance of a model by computing the angle between the current weights and its initialized ones. The proposed metric is more efficient, more accurate and more stable than existing metrics. Based on the angle-based metric, this dissertation further presents a general search space shrinking method, which aims to get a smaller and better search space by dynamically dropping out redundant operators from the search space. Such shrunk search space doesn’t depend on the specific search algorithms, which can reduce the searching difficulties of different NAS approaches. Experimental results show that some popular NAS methods such as SPOS and DARTS are able to discover the lightweight models with the higher performance from the shrunk search space discovered by our approach.
关键词	卷积神经网络模型轻量化通道剪枝低比特量化神经架构搜索
语种	中文
资助项目	National Natural Science Foundation of China[61673376] ; National Natural Science Foundation of China[61673376]
七大方向——子方向分类	模式识别基础
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/44815
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	胡一鸣. 深度卷积神经网络轻量化方法研究[D]. 中国科学院自动化研究所. 中国科学院大学,2021.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
深度卷积神经网络轻量化方法研究.pdf（11065KB）	学位论文		限制开放	CC BY-NC-SA