Deep learning techniques, especially deep neural networks, have emerged as a powerful strategy for learning feature representations directly from data and led to remarkable breakthroughs in many image understanding tasks. Most of current employed deep neural networks are designed by human experts, which is time-consuming and error-prone. At the same time, with the rise of the Internet of Things (IoT), embedded devices such as smartphones and uncrewed vehicles demand low power and real-time implementation of image understanding algorithms. Although deep neural networks have a superior performance advantage in many image understanding tasks, the expensive computing cost makes it difficult to deploy into edge devices. Therefore, how to efficiently design neural network architecture according to different tasks is an important research topic. This thesis focuses on neural architecture search algorithms for efficient image understanding. The specific contributions of this thesis are summarized as follows:
This thesis proposes a cyclic differentiable architecture search algorithm. The differentiable architecture search algorithm has extremely high search efficiency compared with the traditional neural architecture search algorithm. In the search process, the differentiable search algorithm first finds an optimal architecture in a shallow search network, and then further builds a deep evaluation network based on the searched optimal architecture. This practice makes the search and evaluation processes independent to each other, resulting in poor performance. To alleviate this problem, this thesis proposes a cyclic differentiable architecture search algorithm. The algorithm constructs the information feedback path between the search network and the evaluation network. First, the search network generates an initial architecture for evaluation, and the weights of the evaluation network are optimized. Second, the architecture weights in the search network are further optimized by the label supervision in classification, as well as the regularization from the evaluation network through feature distillation. Repeating the above cycle results in a joint optimization of the search and evaluation networks and thus enables the evolution of the architecture to fit the final evaluation network. In this way, this thesis realizes the joint optimization of the search model and the evaluation model, alleviates the problem caused by the independent optimization of the search and evaluation networks in the differentiable architecture search, and verifies the effectiveness of the method in multiple datasets of image classification.
This thesis proposes an introspective distillation for the sampling-based one-shot neural architecture search algorithm. The sampling-based one-shot neural architecture search algorithm uses the weight sharing mechanism in the super-network, which greatly shortens the evaluation time of network architecture performance. However, this weight sharing leads to highly coupled sub-network parameters in the super-network, so that the sub-network in the super-network can not be fully trained. Therefore, the accuracy of the sub-network in the evaluation can not reflect its real performance, which affects the search efficiency. In this thesis, an introspective distillation neural architecture search algorithm is proposed. In the training process, the algorithm dynamically selects the sub-network with the best performance as the teacher model to further guide the training of super-network, so as to alleviate the low search efficiency caused by insufficient super-network training, and has made significant improvements in image classification, object detection and semantic segmentation.
This thesis proposes a text detection and tracking algorithm based on neural architecture search. Traditional neural architecture search algorithms generally search on image classification tasks and then transfer the searched network architectures to different image understanding tasks. Searching for a single non-target task results in a network architecture that is not optimal for a specific task and cannot leverage different supervision information for multiple tasks. For the video text detection task, this thesis proposes an end-to-end video text detection and tracking model. Its feature descriptor realizes the joint modeling of detection and tracking. On this basis, the backbone network architecture search is performed, which significantly improves the model performance.
This thesis proposes a multi-objective and multi-branch neural architecture search algorithm for image segmentation. The search space of the traditional neural architecture search algorithm often contains only a single branch model and the search algorithm takes performance as the only optimization goal, which will limit the application scenarios of the searched architectures and reduce the efficiency of the search algorithm. This thesis proposes a novel search space for image segmentation, which contains a lightweight self-attention module and a lightweight convolution module, and can search for various combinations between different resolution branches. At the same time, the algorithm is a multi-objective search algorithm, including model inference speed, model size and performance. Compared with the previous search algorithm, this algorithm searches for a series of models with different branches of small, medium and large on the Pareto boundary, so it can be more flexible in selecting models according to needs in actual scenarios.
These innovations contribute to leading results in multiple public datasets for image understanding tasks. At the same time, this thesis focuses on the computational efficiency of neural network architecture. The network architectures designed by the algorithms in this thesis have significant advantages over other methods in terms of computation cost. Finally, the methods in this thesis can also be extended to more machine learning tasks based on deep neural networks to further improve the performance of the target task or reduce its computation cost.