|Place of Conferral||中国科学院自动化研究所|
2. 针对提高检测精度的需求，本文提出了一种基于统计学习的网络结构设计方法，通过分析物体尺度与网络有效感受野之间的关系，发现了卷积的膨胀系数能显著地改变网络的有效感受野，设计了一套自适应地设计与调整网络感受野的方法，实验表明该算法可以与各种检测算法结合，在保持功耗不改变的情况 下有效提升目标检测任务精度。
3. 在第二个创新点的基础上，本文进而提出了一种面向目标检测任务的神经网络搜索算法，设计了一套更加适合检测任务的搜索空间，可以同时搜索运算单元的种类与运算单元的通道数。实验结果表明，该算法得到的网络结构能够在 功耗不变的情况下，有效地提升检测任务的准确率，同时可以显著地增大网络的有效感受野。
Object detection has always been one of the most fundamental and active areas among the numerous tasks in computer vision. The core objective of this task could be defined as follows: Given an input image and concerned categories, the detector is supposed to locate each object instance with a bounding box and recognize the involved category. It could be deemed as an extension of the basic image classification task, but also plays the role as the basis of many high-level vision tasks like instance segmentation, tracking, face recognition, person Re-ID, action recognition, etc. Object detection is also one of the most widely used computer vision algorithms in industrial scenarios, which supports plenty of down-stream business like human-machine interaction, auto- pilot, intelligent surveillance, image retrieval, etc. The taxonomy of object detection is multi-fold. For instance, it could be grouped into manual feature design algorithms and deep learning algorithms with respect to the means of feature extraction. Based on the application scenarios, it could also be grouped into generic object detection and single- class object detection such as face detection and pedestrian detection. For algorithms concerned with single-class objection, the parameters could be delicately customized according to the specialty of the category.
Owing to its industrial success and generalizability, the expectations and demands of this task are closely related to its pragmatism, which could be roughly categorized as follow: alleviating the power consumption of object detector, improving the precision of object detector and solving object detection with extreme data distribution. In this dissertation, we delve into these demands and provide decent solutions to each of them.
The main contributions and contents of this dissertation are summarized as follow:
1. First of all, we focus on alleviating the power consumption of neural networks in tasks like image classification and object detection. We design a computing module named spatial bottleneck that could shrink the spatial size of intermediate maps and propose a new type of convolution named chessboard-sampling convolution to further compensate it. It is demonstrated in our experiments that our method could prominently lower the power consumption of neural networks while keeping the accuracy in both the image classification and object detection tasks almost unchanged.
2. To improve the precisions of object detectors while constraining the power consumption, we propose a learning based NAS method and automatically generate the desired network architecture that fits the task of object detection. We discovered that changing dilations could significantly alter the effective receptive fields of network and enable the network to fit the task of object detection. Thus we first train a parent network to learn the distribution of dilation in each convolution layer and then build up a child network based on the predicted dilation distribution. It is demonstrated in our experiments that our approach could prominently improve the precisions of object detection without any extra power consumption or FLOPs.
3. On the basis of the aforementioned method, we design a fine-grained search space that fits the task of object detection and propose a new type of gradient-based NAS method to automatically build up network architectures that are suitable to object detection. Our NAS method could not only search for different type of computing operations, but also search for the appropriate channel widths of each operation, which is unprecedented. Our experiments show that the networks searched through our method could yield high precisions without any extra computation costs.
4. In the last part, we focus on large-scale object detection in the wild. We find the main pain-points of object detection with huge data scale and numerous categories are the multi-labels confusion problem and the extremely long-tailed data distribution. To solve the multi-labels confusion problem, we propose a concurrent SoftMax loss function to alleviate the interactive suppression between multiple positive labels. As for the long-tailed data imbalance problem, we propose a soft-balance sampling method together with a hybrid training scheduler which greatly improve the performance of infrequent categories while keeping the performance of the frequent categories. Experiment results show the great effectiveness of our method.
|彭君然. 目标检测中的人工神经网络结构设计及算法研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2020.|
|Files in This Item:|
|彭君然学位论文_终版.pdf（9308KB）||学位论文||限制开放||CC BY-NC-SA||Application Full Text|
|Recommend this item|
|Export to Endnote|
|Similar articles in Google Scholar|
|Similar articles in Baidu academic|
|Similar articles in Bing Scholar|
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.