深度神经网络自动设计方法研究

	深度神经网络自动设计方法研究
	王家兴
	2021
页数	112
学位类型	博士
中文摘要	近年来，由于性能上突破性的提升，深度神经网络已经被广泛应用于各类任务。网络性能的提升很大程度上得益于更有效的网络结构设计。然而随着模型性能的不断提升，结构越来越复杂多样，更高性能模型的设计也变得越来越困难。此外，对于深度神经网络在各领域的大量应用场景，依靠专家经验和启发式方法来针对不同任务、不同部署环境和资源条件逐个进行模型的设计也很难满足行业迅猛发展的应用需求。因此，前沿的工作开始关注让计算机自动高效地根据任务目标及资源条件设计性能优越的模型。但是，当前的自动化方法还存在着设计效率低、依赖经验技巧缺少分析以及应用场景受限等问题。本文针对深度神经网络的自动化设计的这些问题展开深入研究，主要创新点如下: • 针对目前搜索神经网络逐层压缩策略非常耗时的问题，提出了一种无需反复探索，一次训练即可得到最优逐层压缩策略的自动化模型压缩方法。在有复杂高性能预训练模型的情况下，要得到可以部署的轻量网络最直接方法是对其进行压缩。考虑模型参数的混合精度量化问题，目前自动确定深度神经网络逐层量化策略的方法主要基于强化学习，搜索过程中需反复探索量化策略。每次探索均包含“策略采样-量化训练-策略评估”过程，非常耗时。本文提出贝叶斯自动模型压缩方法，将模型参数量化转化为混合高斯模型推理问题并使用狄里赫雷过程作为混合成分数，即量化位宽的先验分布。经过贝叶斯推理，狄里赫雷过程可以根据数据自动确定合理的量化位宽。本方法可以一次训练得到优化的逐层量化策略，避免了昂贵的“策略采样-量化训练-策略评估”计算循环。提高了自动化模型设计方法的效率。对比基于强化学习的自动化模型压缩方法，所提方法可以在约 1/10 的时间内完成最优逐层量化策略的搜索。 • 针对目前网络结构搜索方法依赖经验技巧，缺乏分析和总体性指导原则的问题，分析了搜索方法中被广泛使用的参数共享技术对网络宽度搜索的影响，并提出了改进的共享机制。在卷积神经网络网络宽度搜索中，通常使用参数共享来进行加速。常见的参数共享方式有两种:偏序共享和独立备选。本文发现二者的“共享程度”不同，其对搜索过程的影响也不同。偏序共享方式共享程度较大，有利于模型精度的提升但会降低不同备选的区分度，独立备选则刚好相反。由此，本文提出“仿射参数共享”机制将二者进行统一，该机制下可以定义可微的量化参数共享程度的指标。最小化该指标可以实现搜索过程中由较大的参数共享程度到较小的参数共享程度的切换，达到前期快速优化模型参数，后期准确区分不同备选的目标。本方法在 CIFAR-10 和 ImageNet 数据集上取得了超过同期宽度搜索方法的效果。 • 针对目前网络结构搜索方法局限于单一任务搜索的问题，提出了一种适用于多任务的高效模型结构搜索方法。目前的网络结构搜索方法无法区分不同的任务，在多任务的情况下，只能针对当前的新任务重新搜索或将此前任务上搜索得到的结构进行迁移。前者非常耗时而后者得到的模型结构可能并不适用于当前任务。“元学习”可以提取多任务上共性信息，并以此作为先验来加速模型在新任务上的学习。基于此，本文结合“元学习”方法，提出“元神经结构搜索”。在神经结构搜索中维护一组超模型的共享“元参数”，并允许在不同的任务/结构上以元参数作为初始化进行微调使得神经结构搜索方法适用于多任务搜索。同时，元参数保留了先前任务上的搜索经验，结合该先验知识可以将模型快速适应到当前任务，从而进行高效的多任务搜索。
英文摘要	Recent years have witnessed much success of deep neural networks in many fields such as computer vision, natural language processing, and speech recognition. The success is largely due to new architecture design. But with the continuous improvement of network performance and enlarging of design space, manually improve network architecture is becoming more and more difficult. Besides, applications in various industries construct a large number of different tasks. Developing task-specific model architectures according to different deployment environments poses great challenges to traditional architecture design paradigm that relies on expertise and handcrafted heuristics. Therefore, cutting-edge works began to develop methods that automatically design high-performing architectures according to task-specific objectives and resource constraints. However, the current automation methods still have problems such as low design efficiency, relying on expertise, lack of explainability, and limited application scenarios. This paper conducts in-depth research on automatic model design, the specific research content and contributions are summarized as follows: • An automatic model compression method that explores optimal layer-by-layer compression strategy in a one-shot manner is proposed. When a high-performing pre-trained model is available，the most direct way to obtain a lightweight network that can be deployed is to conduct model compression. Current reinforcement learning (RL) based automatic model compression methods usually require hundreds of rollouts to train the agents, whereas in each rollout the network is quantized and retrained to get a decent reward given current compression policy. Therefore, the RL-based approaches could be time-consuming. Aiming at the mixed-precision quantification of model parameters, that is, exploring the reasonable layer-by-layer quantization bit-width, this dissertation proposes Bayesian automatic model compression (BAMC), which converts parameter quantization into a Gaussian mixture inference problem and uses Dirichlet process (DP) to learn the optimal quantization bit-width for each layer. BAMC is trained in a one-shot manner, avoiding the back and forth (re)-training in reinforcement learning based approaches. • The impact of different parameter sharing schemes on network channel number search is analysed, and a unified parameter sharing mechanism is proposed which can better balance training efficiency and architecture discrimination. The width configuration greatly affects the accuracy of convolutional neural networks and parameter sharing is usually used to accelerate the width search. There is two common parameter sharing methods: ordinal sharing and independent selection. We find that ordinal sharing enjoys a higher level of sharing, thus accelerates the searching process (i.e., faster accuracy rise) by better aligning the gradients of different candidates. However, this also results in coupled optimization among different candidates, making architectures less discriminative. A unified sharing scheme ”affine parameter sharing” (APS) is proposed in this dissertation, APS provides a natural measurement of sharing level among different candidates. With the measurement, we then propose a transitionary strategy for APS. Specifically, the sharing level is initialized at maximum such that network parameters can be rapidly optimized in the early stage. Then it is gradually annealed during the searching process such that good architectures can be better distinguished. We also consider the computational complexity of the candidates during searching, in order to find models that can be deployed under resource constraints. • A method that efficiently generates task-specific model architecture for heterogeneous tasks of various kinds is proposed. Current neural architecture search (NAS) methods can only enumerate the search space for a pre-defined task. For a previously unseen task, the architecture is either searched from scratch, which is inefficient, or transferred from the one obtained on some other task, which might be sub-optimal. Towards this problem, This dissertation proposes meta neural architecture Search (M-NAS). To obtain task-specific architectures, M-NAS adopts a task-aware architecture controller for child model generation. Since optimal weights for different tasks and architectures span diversely, M-NAS resort to meta-learning, and learn meta-weights that efficiently adapt to a new task on the corresponding architecture with only several gradient descent steps. In this way, neural architecture search can be efficiently conducted on multiple heterogeneous tasks.
关键词	深度学习网络结构搜索模型压缩贝叶斯方法机器学习
语种	中文
七大方向——子方向分类	AI芯片与智能计算
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/44753
专题	复杂系统认知与决策实验室_高效智能计算与学习
推荐引用方式 GB/T 7714	王家兴. 深度神经网络自动设计方法研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2021.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
毕业论文_王家兴.pdf（9209KB）	学位论文		开放获取	CC BY-NC-SA