英文摘要 | Recently, in many fields, deep learning achieves a great success where deep model design plays an important role. Consequently, a great number of excellent hand-crafted models are proposed, and delivers very high performance. However, it is extreme time consuming to design deep model by human. In order to automatically design architecture, Neural Architecture Search (NAS) is proposed. However, vanilla NAS requires
huge computational cost. For that, researchers propose many efficient NAS approaches, via improving search space, search strategy and performance estimation. At present, all NAS approaches employ deep-topology search space to discover high-performance architecture.
Nevertheless, there are two issues in the search procedure of deep-topology search pace: 1) time-consuming single-step training: NAS needs more time to train the search space with proxy dataset; 2) inefficient memory: NAS can not deal with more training time simultaneously on a specific computing device. Shallow-topology search space not only can effectively solve the above two issues, but also can lead to performance
drop due to large model gap between search and evaluation phases.
Broad learning system employs shallow broad topology to deliver similar even better performance than deep network, so that the above issue can be solved well by broad learning system. Inspired by broad learning system, this thesis proposes three efficient Broad Neural Architecture Search (BNAS) approaches which can improve the search efficiency while avoiding performance drop of the learned architecture. First of
all, three broad search spaces are designed to solve the above two issues in deep search space, and policy gradient based reinforcement learning algorithm is employed for architecture optimization; Next, the strategy of continuous relaxation is used to transfer the search space from discrete to continuous, so that the efficiency of BNAS can be improved further; Then, broad search space is redesigned to a new one, and further improve the search efficiency of BNAS via early stopping strategy; At last, CIFAR-10
and ImageNet are used to verify the performances of BNASs. The contributions of this thesis are summarized as follows:
1 Broad convolutional neural network based broad neural architecture search
Broad search space dubbed Broad Convolutional Neural Network (BCNN) is proposed to solve the above two issues in deep search space. Compared with deep search space, BCNN is able to obtain similar or better classification performance with shallow topology. Furthermore, BNAS-v1 is proposed by combining the broad search space and policy gradient based reinforcement learning. Experimental results show that the
efficiency of BNAS-v1 ranks the best in reinforcement learning based NAS approaches, and the learned architecture delivers satisfactory classification performance.
2 Differentiable broad neural architecture search
Differentiable BNAS named BNAS-v2 is proposed to solve the unfair training issue in search procedure. BNAS-v2 employs the strategy of continuous relaxation to update every candidate child network which can solve the unfair training issue caused by single-path sampling-update optimization manner, for larger efficiency improvement. Furthermore, both confident learning rate and partial connection are employed to mitigate the consequent issue of continuous relaxation called performance collapse. Experimental results show that BNAS-v2 delivers 4× faster efficiency and better classification accuracy compared with BNAS-v1.
3 Stacked broad neural architecture search
Stacked BNAS is proposed to solve two issues of BCNN: 1) scale information diversity losing; 2) time-consuming knowledge embedding design. On the one hand, stacked BCNN is proposed based on vanilla BCNN to preserve all scale informations. On the other hand, a differentiable Knowledge Embedding Search (KES) algorithm is
proposed to solve the issue of time-consuming knowledge embedding design. Experimental results show that stacked BCNN can deliver better classification performance than vanilla BCNN; KES can effectively reduce the redundant information of handcrafted knowledge embedding, so that the parameter counts of stacked BCNN can be reduced without performance loss. |
修改评论