深度卷积神经网络量化表示研究

CASIA OpenIR > 毕业生

	深度卷积神经网络量化表示研究
	贺翔宇
	2022-05
页数	132
学位类型	博士
中文摘要	量化表示是机器学习领域的一类经典问题，它通过将数据从连续的原空间投影到离散的特征空间，使得计算代价高昂的浮点数乘加操作可以被高效的低数值精度运算替代，从而大幅提升特征向量匹配及以此为基础的相关运算的执行效率。同时，深度卷积神经网络的核心——卷积运算也恰好建立在基于浮点数乘加的相关操作之上，并且卷积层的计算量在深度卷积神经网络总计算量中的占比通常超过90\%。因此，将量化表示引入到深度卷积神经网络的研究中，将为降低深度卷积神经网络的计算复杂度及空间复杂度提供新的思路。遗憾的是，量化表示在带来计算效率提升的同时，往往也伴随着原始信息的丢失，进而造成模型性能的下降。如何既获取量化表示带来的速度收益，又不过多损失模型表征能力，已经成为深度卷积神经网络量化表示的关键问题。另一方面，对于大量标注数据的依赖性是深度卷积神经网络的另一短板。事实上，用户在期望获得更加轻量化的模型时，并不希望暴露自身的私有数据，这对于已有的基于有标签数据集进行再训练的量化表示算法提出了新的挑战。具体地，深度卷积神经网络量化表示存在如下亟待解决的问题：如何在梯度更新框架下求解量化表示中的离散优化问题、量化表示在小样本/零样本环境下的学习方法应如何设计等。针对以上问题，本文从量化表示学习的角度对深度卷积神经网络模型压缩与加速展开了如下研究：针对二值化权值在梯度反向传播中优化困难的问题，本文提出了一种基于代理矩阵的权值二值优化方法。已有方案中，采用全精度权值作为隐变量累计梯度信息，在前向传播将其进行二值量化。该种策略虽然在小规模数据集上取得了一定效果，但量化引入的梯度噪声仍然较大，故在较困难任务上表现不佳。本文通过引入代理矩阵，将每个二值化权值与多个隐变量关联，减小量化损失的同时降低了反传中梯度噪声的影响。此外，本文对二值化权值构建过程中的有约束优化问题进行迭代求解，解决了多变量相互耦合导致原问题难以求解的问题。针对二值化特征表征能力不足，反向传播中梯度难以估计的问题，本文提出了一种基于稀疏表示的特征二值化方法。符号函数是特征二值量化中的常用量化函数形式，将神经网络中的激活特征分别量化为+1和-1，这将有助于以位运算代替计算代价高昂的内积计算。本文提出在训练时将激活量化为任意两个值，均可以在推理时等价转化为+1和-1，且不引入额外的计算开销。基于此观点，本文对二值量化函数的自由度进行了分析，提出了可学习阈值与特征稀疏表示的概念，进一步提升了二值网络的表征能力，使二值网络在大规模数据集上精度得以大幅提升。针对无标签小样本情况下，权值低比特量化后，未经微调的网络出现精度大幅损失的问题，本文提出一种基于权值迭代优化以及对批量归一化层中统计量重采样的权值后量化方法。早期的神经网络压缩方案大多基于原始有标签数据集进行再训练，用以恢复网络压缩带来的精度损失。本文关注到网络后量化的精度损失在很大程度上源自权值量化后带来的特征分布偏移。针对这一问题，本文提出两种改进方案，即迭代优化以及基于少量无标签数据的批量归一化层统计量重采样方案。前者尽可能地减小权值量化误差，后者用以矫正权值量化带来的特征分布偏移且无需样本标签。该策略较好地解决了无标签小样本条件下对卷积神经网络进行低比特量化精度大幅损失的问题。针对零样本条件下，神经网络激活值无法获取，激活统计量难以估计，进而无法进行激活函数量化的问题，本文提出一种基于变分自编码器的生成式网络量化方法。已有的以拟合批量归一化层中统计量为优化目标，通过梯度反传更新网络输入噪声的方法，经验性地证明了该种策略可以优化出对量化较为友好的“伪数据”，但是缺乏必要的理论解释。同时，过度拟合批量归一化层统计量，也会导致生成的图像视觉效果较差。本文将该优化过程建模为变分自编码器的参数求解过程，提出了该问题优化目标的一般形式，并指出已有方案中仅拟合统计量的方式是该一般形式的一种退化。此外，通过引入后量化的网络作为额外监督，本文进一步提升了生成图像的质量，从而有助于后续基于“伪数据”的后量化及量化感知训练的精度提升。
英文摘要	Quantized representation is a long-standing problem in the field of computer vision and the machine learning community, which projects data from a continuous space into a discrete feature space to speed up the following feature matching, retrieval, etc. It replaces the computing-expensive floating-point multiply-add operations with more efficient fixed-point or bitwise operations. Fortunately, the convolution operation, which plays the core role in deep Convolutional Neural Networks (CNN), is also based on floating-point multiply-accumulate operation and consumes over 90\% computing cost in CNN. Therefore, we may solve the problem of high computational and space complexity of CNNs by the quantized representation. Unfortunately, along with the improvements in computational efficiency, quantized representation also suffers from poor performances due to the information loss during the quantization process. We are wondering if it would be possible to enjoy both high efficiency and effectiveness. On the other hand, the dependence on a large amount of labeled data is another shortcoming of deep convolutional neural networks. In fact, the data itself has become more valuable than the model parameters. Both hardware platform suppliers and application developers are expecting more lightweight models. However, due to the user privacy agreements and commercial licenses, they are not allowed to expose the private data and can only provide pre-trained models, which is challenging for the mainstream training-aware quantization schemes. Overall, the quantized representation of deep neural networks becomes more important in real-world applications, still, some crucial issues are rarely discussed, e.g., the discrete optimization problem of the quantized representation under the gradient-based optimization framework, few-shot/one-shot learning methods of quantized representation, differentiable quantization function design, etc. To this end, this dissertation concentrates on the compression and acceleration of deep convolutional neural networks from the perspective of learning-to-quantize. The main contributions are summarized as follows: This dissertation proposes a weight binarization method for Binarized Neural Networks (BNNs) based on the proxy matrix to circumvent the non-differentiable sign function in BNN training. It is a common setting to use full-precision latent weights to accumulate gradients during backward propagation and contributes to promising results on small datasets. However, the performances on the large-scale dataset are far from satisfying. In this dissertation, by introducing a proxy matrix, each binarized weight is determined by multiple latent variables, which reduces the quantization error and the gradient noise in the backward propagation. In addition, we decouple the original problem into three sub-problems with two auxiliary variables and iteratively solve the constrained optimization problem in each step. This dissertation proposes an activation binarization method based on sparse representation to alleviate the gradient noise induced by sign function in BNN training. Sign function is widely used in recent BNN methods that quantizes activations into +1 and -1. It is well-known that this setting will allow us to replace the computing expensive inner-product with bitwise operations. In this dissertation, we first show that BNNs are free to binarize activations into any two numbers during training, which can be equivalently converted to +1 and -1 at inference time without introducing additional computational overhead. In light of this, this dissertation reconsiders the design of sign function via the degree of freedom and proposes quantizing floating-point activations into 0 and +1 with learnable thresholds. Benefiting from the enhanced representation capacity, sparsity-induced binarized neural networks further improve the accuracy of BNNs on the large-scale dataset. This dissertation proposes a post-training quantization method based on iterative optimization for weights quantization and re-estimated batch-normalization statistics to refine the activation distribution. Pioneering quantization methods commonly rely on the original labeled dataset to retrain the quantized models. This dissertation attributes the performance degradation to the shift of feature distribution caused by the weights quantization. To this end, we present an iterative low-bit quantization scheme to minimize the weights quantization error and utilize limited unlabeled data to re-estimate the statistics of batch-normalization layers. This strategy can be further generalized to weights pruning under the constraint of unsupervised few-shot learning. This dissertation proposes a generative zero-shot quantization method inspired by the variational auto-encoder to generate pseudo-data that facilitates the following activation quantization. Recent methods aim to create the synthetic data that best fits the batch-normalization statistics via gradient backpropagation. Though it achieves empirical success, theoretical insights are still missing. Meanwhile, the overfitting of batch-normalization statistics will cause the effect of mode collapse. This dissertation formulates the optimization process as solving a variational auto-encoder and unifies the objective function of mainstream methods. It turns out that the hand-crafted design is in fact a degradation of the general formulation. Besides, we involve a post-training quantized model to conduct the knowledge distillation, which serves as an extra regularization to improve the quality and fidelity of synthetic data.
关键词	量化表示特征学习二值化深度卷积神经网络
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/48482
专题	毕业生
推荐引用方式 GB/T 7714	贺翔宇. 深度卷积神经网络量化表示研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2022.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
贺翔宇_毕业论文.pdf（10648KB）	学位论文		限制开放	CC BY-NC-SA