Most network architectures are hand-crafted and usually require expert knowledge, experience and elaborate design. A high-performance network architecture generally possesses a tremendous number of possible configurations about the number of layers, hyperparameters in each layer and type of each layer. It is really hard for the human to design or improve network architecture. Therefore, some recent works have attempted computer-aided or automated network architecture design. But there are several challenges unsolved: (1) The large number of convolutional layers and the numerous options in type and hyperparameters of each make huge search space and heavy computational costs for network generation. (2) The network designed on a specific dataset or task yields inferior performance when transferred to other datasets or tasks. We begin with the neural network architecture on specific tasks, designing and modifying the deep neural network architecture manually. Based on these works, we study into automated neural network architecture design methods. The contributions of this dissertation are summarized as follows:
- We propose a new deep neural network architecture for handwritten Chinese character recognition. The architecture of our proposed framework includes two parts: spatial transformer network and deep residual network. We use the spatial transformer network learned directly from the data to transform the input handwritten Chinese character images into regular characters, for improving the final recognition rate. Deep residual network uses short cut connection to answering the problem of vanishing or exploding gradients, so the learning ability of the network can also be significantly improved. The experimental results show that our model can achieve a new state-of-the-art performance for handwritten Chinese character recognition and is robust to the irregular handwritten Chinese characters.
- We propose a novel deep neural network architecture focus on tackling decision-making problem in object tracking. The model consists of a convolutional neural network as the decision controller and learns an optimal decision with deep reinforcement learning algorithm. The decision controller can make reasonable decision choices for different scenarios in object tracking area. To prove the ability of decision controller, we apply it to the challenging ensemble problem in Single Object Tracking for choosing the better tracker dynamically to get better performance, and tracker-detector switching problem in Multiple Object Tracking for accelerating the speed. The experimental results show that the controller for tracker ensemble achieves state-of-the-art performance in public Single Object Tracking challenges. Meanwhile, we can use the switching controller to construct a real time Multiple Object Tracking.
- We propose an automated deep neural network architecture search framework based on reinforcement learning and block design. We are the first to consider block-wise setup in automatic network generation area, it squeezes the search space of the entire network design and speeds up the searching process. Moreover, the generated architectures are thus succinct and have powerful generalization ability. This framework includes three parts: block-wise network search space, reinforcement learning controller and early stop strategy. First, we propose a network structure code to represent the block-wise neural network architecture. And based on the structure code, we build a reinforcement learning controller to sample and search optimal neural network architecture automatically. To accelerate the generation process, we also propose a distributed asynchronous framework and an early stop strategy. The experimental results on CIFAR-10, CIFAR-100 and ImageNet show that it yields state-of-the-art results in comparison to the hand-crafted networks on image classification.
- We propose a fast automated deep neural network architecture search method and a block connection search method, completed the entire automated neural network architecture search framework. Most neural network architecture search methods require a very time-consuming performance evaluation process, to mitigate this cost, we first use network structure code to represent the block-wise neural network architecture and convert the code into real vectors with layer embedding. Then we adopt the Long-Short Term Memory and multilayer perceptron to predict the network performance before training. With our method, the demand of computing resource will be further reduced. The experimental results on CIFAR show that we can get a comparable result with only 1 GPU in 1day which means the amount of calculation is reduced to less than one thousandth compared to the conventional method. Moreover, based on the existing framework, by modifying the search space design, we propose an automated block connection search method. The experimental results show that our method can find the optimal connection style better than human prior knowledge definitions, completed the entire framework.