目标检测中的人工神经网络结构设计及算法研究

CASIA OpenIR > 毕业生 > 博士学位论文

	目标检测中的人工神经网络结构设计及算法研究
	彭君然
	2020-12-06
页数	108
学位类型	博士
中文摘要	目标检测一直是计算机视觉里最基本和最活跃的领域之一。该任务的核心目标是在给定图像及所关注类别的情况下，在图中准确锁定给定类别的每个目标实例的位置，以图框来表示其位置及边界，并返回各个实例的类别。它可以视为是图像分类任务的延伸，也构成了各种高层视觉任务的基础，如目标实例分割、目标跟踪、人脸识别、行人识别、行为识别等等。目标检测也是在工业界应用最广泛的计算机视觉技术之一，为大量下游业务提供核心支持，包括人机交互、自动驾驶、智能监控、图像检索等等。目标检测从不同的角度出发有多种分类方式。根据提取特征的方式，目标检测可以被分为传统手工特征方法和深度学习特征方法两种。根据检测的目标类别，目标检测又可以被分为针对特定类别的检测和通用目标检测。前者包含人脸检测、车辆检测和行人检测等，检测方法可以针对特定类别的形状分布等先验信息进行定制化；后者往往要求检测器同时检测出各种类别的目标实例。目标检测在工业中应用十分广泛，因此对于该技术的需求与期待与其实用性密不可分。从实用性出发，对这项技术提出的需求可归纳为以下三类:减小目标检测计算功耗，提高目标检测精度，完成特定或极端数据分布下的目标检测。针对这些需求，本文分别进行了深入的探讨与研究。本文的研究内容和贡献可归纳如下: 1. 本文提出了一种在空间维度稀疏计算的网络结构，并设计了“棋盘式采样”卷积，大幅降低了神经网络的计算量。实验表明该算法在分类任务和检测任务上均可以做到在精度改变不大的情况下，显著地减小功耗。 2. 针对提高检测精度的需求，本文提出了一种基于统计学习的网络结构设计方法，通过分析物体尺度与网络有效感受野之间的关系，发现了卷积的膨胀系数能显著地改变网络的有效感受野，设计了一套自适应地设计与调整网络感受野的方法，实验表明该算法可以与各种检测算法结合，在保持功耗不改变的情况下有效提升目标检测任务精度。 3. 在第二个创新点的基础上，本文进而提出了一种面向目标检测任务的神经网络搜索算法，设计了一套更加适合检测任务的搜索空间，可以同时搜索运算单元的种类与运算单元的通道数。实验结果表明，该算法得到的网络结构能够在功耗不变的情况下，有效地提升检测任务的准确率，同时可以显著地增大网络的有效感受野。 4. 最后，本文提出了一种超大规模数据条件下的目标检测算法，设计了特定的损失函数以应对多标签条件下不同正标签互相压制的问题，提出了混合采样方式并结合相应的混合训练策略，改善了长尾分布问题，实验验证了该方法的显著效果。
英文摘要	Object detection has always been one of the most fundamental and active areas among the numerous tasks in computer vision. The core objective of this task could be defined as follows: Given an input image and concerned categories, the detector is supposed to locate each object instance with a bounding box and recognize the involved category. It could be deemed as an extension of the basic image classification task, but also plays the role as the basis of many high-level vision tasks like instance segmentation, tracking, face recognition, person Re-ID, action recognition, etc. Object detection is also one of the most widely used computer vision algorithms in industrial scenarios, which supports plenty of down-stream business like human-machine interaction, auto- pilot, intelligent surveillance, image retrieval, etc. The taxonomy of object detection is multi-fold. For instance, it could be grouped into manual feature design algorithms and deep learning algorithms with respect to the means of feature extraction. Based on the application scenarios, it could also be grouped into generic object detection and single- class object detection such as face detection and pedestrian detection. For algorithms concerned with single-class objection, the parameters could be delicately customized according to the specialty of the category. Owing to its industrial success and generalizability, the expectations and demands of this task are closely related to its pragmatism, which could be roughly categorized as follow: alleviating the power consumption of object detector, improving the precision of object detector and solving object detection with extreme data distribution. In this dissertation, we delve into these demands and provide decent solutions to each of them. The main contributions and contents of this dissertation are summarized as follow: 1. First of all, we focus on alleviating the power consumption of neural networks in tasks like image classification and object detection. We design a computing module named spatial bottleneck that could shrink the spatial size of intermediate maps and propose a new type of convolution named chessboard-sampling convolution to further compensate it. It is demonstrated in our experiments that our method could prominently lower the power consumption of neural networks while keeping the accuracy in both the image classification and object detection tasks almost unchanged. 2. To improve the precisions of object detectors while constraining the power consumption, we propose a learning based NAS method and automatically generate the desired network architecture that fits the task of object detection. We discovered that changing dilations could significantly alter the effective receptive fields of network and enable the network to fit the task of object detection. Thus we first train a parent network to learn the distribution of dilation in each convolution layer and then build up a child network based on the predicted dilation distribution. It is demonstrated in our experiments that our approach could prominently improve the precisions of object detection without any extra power consumption or FLOPs. 3. On the basis of the aforementioned method, we design a fine-grained search space that fits the task of object detection and propose a new type of gradient-based NAS method to automatically build up network architectures that are suitable to object detection. Our NAS method could not only search for different type of computing operations, but also search for the appropriate channel widths of each operation, which is unprecedented. Our experiments show that the networks searched through our method could yield high precisions without any extra computation costs. 4. In the last part, we focus on large-scale object detection in the wild. We find the main pain-points of object detection with huge data scale and numerous categories are the multi-labels confusion problem and the extremely long-tailed data distribution. To solve the multi-labels confusion problem, we propose a concurrent SoftMax loss function to alleviate the interactive suppression between multiple positive labels. As for the long-tailed data imbalance problem, we propose a soft-balance sampling method together with a hybrid training scheduler which greatly improve the performance of infrequent categories while keeping the performance of the frequent categories. Experiment results show the great effectiveness of our method.
关键词	目标检测，低功耗，神经网络搜索，长尾数据分布，多标签识别
语种	中文
七大方向——子方向分类	目标检测、跟踪与识别
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/41620
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	彭君然. 目标检测中的人工神经网络结构设计及算法研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2020.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
彭君然学位论文_终版.pdf（9308KB）	学位论文		限制开放	CC BY-NC-SA