深度神经网络结构:从人工设计到自动学习

CASIA OpenIR > 多模态人工智能系统全国重点实验室 > 模式分析与学习

	深度神经网络结构:从人工设计到自动学习
	钟钊
	2019-05
页数	124
学位类型	博士
中文摘要	设计深度神经网络结构是一件非常困难的工作，它需要高级专家的经验和知识、大量的试错尝试、甚至灵感。而随着网络性能的不断上升，网络结构越来越复杂，网络模型的提升也越来越困难。所以近年来，一些前沿的工作开始关注让计算机自动设计（搜索）出性能优秀的深度神经网络结构。但是，当前的自动化方法还存在着计算消耗巨大、泛化性差、速度慢等一系列挑战。本文首先从特定任务的神经网络结构入手，人工地进行了深度神经网络结构的修改和设计，并在此基础上针对自动化神经网络结构设计方法展开了深入的研究。主要贡献总结如下：提出了一种人工设计的针对中文手写字符识别的深度神经网络结构模型。该模型包含两个模块：空间变换网络和残差网络。空间变换网络通过数据自动地学习出适合输入中文文字样本的仿射变换矩阵，自动地将输入中文文字图片进行矫正，以提高最终的识别率。残差网络使用残差连接网络结构缓解了深度神经网络梯度消失和爆炸的问题，使得网络学习特征的能力大大加强。在公开数据集上的实验结果表明，本方法取得了最佳性能并且能起到矫正中文手写文字的作用。提出了一种人工设计的针对视觉跟踪决策的深度神经网络结构模型。该模型主要由一个卷积神经网络作为决策控制器，并通过强化学习算法进行决策的更新学习。神经网络决策控制器可以在视觉跟踪任务上针对不同的场景做出合理的决策选择。在单目标跟踪任务上，可以利用本方法来集成多个目标跟踪器，动态化地选择最优的目标跟踪框，以提高准确率。在多目标跟踪任务上，可以利用本方法对目标跟踪器进行动态切换，从而提升多目标跟踪算法的效率。在公开数据集上的实验结果表明，本方法在单目标跟踪上取得了最佳的性能并在多目标跟踪问题上大大提高了算法的速度。提出了基于强化学习和分块思想的自动化深度神经网络结构搜索框架。该框架首次将网络分块的思想引入自动化神经网络结构搜索领域，使得网络搜索空间大大降低，网络生成速度加快，并且让生成的网络结构获得了优秀的泛化能力。该框架具体包括三个部分：分块网络搜索空间，强化学习控制器，提前停止策略。我们首先在分块网络搜索空间中提出了一种通用的神经网络结构编码，用来表示任意的深度神经网络结构。然后根据这种通用神经网络编码，我们构建了基于强化学习的控制器，用来进行神经网络结构的采样和搜索。最后采取提前停止策略对搜索网络进行训练评价，并使用分布式异构框架，让整个网络生成的速度进一步的加快。在CIFAR-10、CIFAR-100和ImageNet三个公开的图像分类数据集上，该方法搜索产生的网络结构都取得最佳的性能，验证了其超越人类设计网络结构的超强能力。提出了快速深度神经网络结构搜索方法和网络连接搜索方法，完善了整个自动化神经网络结构搜索框架。绝大部分自动化深度神经网络结构搜索算法需要非常耗时的网络性能评估过程，该方法首先通过通用分块神经网络结构编码对生成的网络进行表示，然后将编码信息经过层嵌入映射为向量，输入到长短期记忆网络进行训练学习，最后通过多层感知器直接预测出输入网络结构的具体性能。实验结果表明该方法大大地降低了计算消耗，在CIFAR数据集上，仅仅需要一块显卡运行一天时间，就可以搜索出具有竞争力的深度神经网络结构，相对常规方法计算量降低到了千分之一以下。并且在原有基于强化学习和分块网络思想的自动化神经网络结构搜索框架的基础上，通过改变搜索空间设计，提出了一个自动搜索分块网络最优连接的方法。实验结果表明，通过这种方法可以找到优于人类先验知识定义的分块网络堆叠方式，完善了分块网络结构组网的工作。
英文摘要	Most network architectures are hand-crafted and usually require expert knowledge, experience and elaborate design. A high-performance network architecture generally possesses a tremendous number of possible configurations about the number of layers, hyperparameters in each layer and type of each layer. It is really hard for the human to design or improve network architecture. Therefore, some recent works have attempted computer-aided or automated network architecture design. But there are several challenges unsolved: (1) The large number of convolutional layers and the numerous options in type and hyperparameters of each make huge search space and heavy computational costs for network generation. (2) The network designed on a specific dataset or task yields inferior performance when transferred to other datasets or tasks. We begin with the neural network architecture on specific tasks, designing and modifying the deep neural network architecture manually. Based on these works, we study into automated neural network architecture design methods. The contributions of this dissertation are summarized as follows: We propose a new deep neural network architecture for handwritten Chinese character recognition. The architecture of our proposed framework includes two parts: spatial transformer network and deep residual network. We use the spatial transformer network learned directly from the data to transform the input handwritten Chinese character images into regular characters, for improving the final recognition rate. Deep residual network uses short cut connection to answering the problem of vanishing or exploding gradients, so the learning ability of the network can also be significantly improved. The experimental results show that our model can achieve a new state-of-the-art performance for handwritten Chinese character recognition and is robust to the irregular handwritten Chinese characters. We propose a novel deep neural network architecture focus on tackling decision-making problem in object tracking. The model consists of a convolutional neural network as the decision controller and learns an optimal decision with deep reinforcement learning algorithm. The decision controller can make reasonable decision choices for different scenarios in object tracking area. To prove the ability of decision controller, we apply it to the challenging ensemble problem in Single Object Tracking for choosing the better tracker dynamically to get better performance, and tracker-detector switching problem in Multiple Object Tracking for accelerating the speed. The experimental results show that the controller for tracker ensemble achieves state-of-the-art performance in public Single Object Tracking challenges. Meanwhile, we can use the switching controller to construct a real time Multiple Object Tracking. We propose an automated deep neural network architecture search framework based on reinforcement learning and block design. We are the first to consider block-wise setup in automatic network generation area, it squeezes the search space of the entire network design and speeds up the searching process. Moreover, the generated architectures are thus succinct and have powerful generalization ability. This framework includes three parts: block-wise network search space, reinforcement learning controller and early stop strategy. First, we propose a network structure code to represent the block-wise neural network architecture. And based on the structure code, we build a reinforcement learning controller to sample and search optimal neural network architecture automatically. To accelerate the generation process, we also propose a distributed asynchronous framework and an early stop strategy. The experimental results on CIFAR-10, CIFAR-100 and ImageNet show that it yields state-of-the-art results in comparison to the hand-crafted networks on image classification. We propose a fast automated deep neural network architecture search method and a block connection search method, completed the entire automated neural network architecture search framework. Most neural network architecture search methods require a very time-consuming performance evaluation process, to mitigate this cost, we first use network structure code to represent the block-wise neural network architecture and convert the code into real vectors with layer embedding. Then we adopt the Long-Short Term Memory and multilayer perceptron to predict the network performance before training. With our method, the demand of computing resource will be further reduced. The experimental results on CIFAR show that we can get a comparable result with only 1 GPU in 1day which means the amount of calculation is reduced to less than one thousandth compared to the conventional method. Moreover, based on the existing framework, by modifying the search space design, we propose an automated block connection search method. The experimental results show that our method can find the optimal connection style better than human prior knowledge definitions, completed the entire framework.
关键词	深度神经网络深度学习网络结构搜索强化学习机器学习
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/23872
专题	多模态人工智能系统全国重点实验室_模式分析与学习
推荐引用方式 GB/T 7714	钟钊. 深度神经网络结构:从人工设计到自动学习[D]. 中科院自动化研究所. 中国科学院大学,2019.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
深度神经网络结构-从人工设计到自动学习_（8590KB）	学位论文		开放获取	CC BY-NC-SA