物体检测中锚框机制的研究

CASIA OpenIR > 毕业生 > 博士学位论文

	物体检测中锚框机制的研究
	张士峰
	2020-05-25
页数	142
学位类型	博士
中文摘要	物体检测是利用计算机从图像中找出指定物体的技术。从研究角度来看，物体检测是众多高层视觉任务的基础，其性能的好坏对后续任务的表现有着重大影响。从应用角度来看，物体检测在日常生活中有着广泛的需求，众多实际应用中都有物体检测作为核心技术在背后的贡献。因此开展物体检测这项研究具有重要的理论意义和应用价值。随着深度学习的到来，物体检测领域逐渐被基于锚框的算法所统治。这类方法首先在图像上铺设大量人为设计的锚框，然后对锚框和物体进行匹配，接着依据锚框匹配结果以及相应损失函数对锚框进行一次或多次的类别判断以及位置校正，最后输出校正过的锚框及其预测类别作为最终的检测结果。因此，锚框是这类算法的核心。本文以物体检测中的锚框机制为研究重点，在锚框的设计匹配、关系学习以及高效预测这三个方面上进行深入地探索，解决现有算法在小尺度物体检测、遮挡物体检测、精度速度平衡上的问题，扩展和完善基于锚框的物体检测算法。论文的主要贡献包含以下几方面： 1. 提出了一个高精度的人脸检测算法SFDet。针对小尺度人脸检测中存在的不公平性，本文提出了基于有效感受野理论和等比间隔原则的锚框设计方案，以及尺度补偿的的锚框匹配策略，从而在训练过程中公平地处理不同尺度的人脸，最终在各个学术数据集上取得了优异的检测性能。 2. 提出了一个高效率的人脸检测算法FaceBoxes。本文设计了一个轻量级网络结构来满足CPU实时速度的要求，并提出了锚框密集化的设计策略以及分而治之的锚框匹配策略来确保检测精度，从而在检测小尺度人脸时较好地平衡了速度和精度。 3. 提出了一个分块聚合行人检测算法OR-CNN。该算法利用行人本身的结构关系设计了一个分块遮挡感知特征融合操作来缓解类间遮挡问题，利用锚框之间的归属关系引入了一个聚合损失函数来缓解类内自遮挡问题，提升了遮挡行人的检测性能，在交通场景下的行人检测数据集上取得了领先的性能。 4. 提出了一个人头人体联合检测算法JointDet。本文利用人头和人体之间的上下文关系设计了一个联合检测方法，不仅能够抑制人头检测中常见的虚检，同时还能召回人体检测中被错误抑制地漏检，有效提升了遮挡严重情况下的人头人体检测效果。 5. 提出了一个取长补短物体检测算法RefineDet。该算法利用特征金字塔结构的上半部分对锚框做第一次类别预测和位置校正，下半部分级联地对锚框做第二次类别预测和位置校正，从而具备了单阶段法的速度和多阶段法的精度，在精度和速度之间取得了较佳的平衡。 6. 提出了一个自适应训练样本选取算法ATSS。本文设计了一种训练样本选取策略，可以根据每个物体的统计信息自动地选取出它的最佳正样本，能够在不增加任何开销和不引入任何超参数的情况下，高效地提高物体检测的性能。
英文摘要	Object detection is a technique that uses the computer to find objects of specific categories on the image. From the research perspective, object detection is the first step of many high-level vision tasks, and its performance has a significant impact on subsequent tasks. From the application perspective, object detection has a wide range of needs in daily life, it contributes to many practical applications as the core technology. Therefore, the research of object detection has important theoretical significance and application value. With the advent of deep learning, object detection has been dominated by anchor-based methods. They first tile a large number of manually designed anchor boxes on the image, then match these anchor boxes to objects, next classify these anchor boxes and refine their positions, finally output these refined anchor boxes with confidence scores as final detection results. Thus, the anchor box is the core of this type of detector. This paper focuses on the anchor box mechanism in object detection, and makes in-depth explorations about anchor box in designing and matching, relationship learning and efficient prediction. The goal is to extend and improve anchor-based detectors by solving their existing issues, including small object detection, occluded object detection and accuracy/speed balance. The main contributions of this paper are as follows: 1. Proposing a high-accuracy face detector SFDet. To solve the unfairness issue in small face detection, this paper introduces an effective receptive field theory and an equal density principle to design anchor boxes, and presents a scale compensation operation to match anchor boxes. The proposed method treats faces of different scales fairly during training and achieves state-of-the-art performances on common face detection datasets. 2. Presenting a high-efficiency face detector FaceBoxes. A lightweight backbone network is designed to achieve CPU real-time speed, an anchor densification strategy and a divide-and-conquer strategy are introduced to ensure high performance, so as to reach good balance between speed and accuracy when detecting small faces. 3. Introducing an occlusion-aware pedestrian detector OR-CNN. It designs a part occlusion-aware region of interest pooling unit based on the structural information of pedestrian to alleviate the inter-occlusion problem, proposes an aggregation loss function according to the belonging relationship of anchor boxes to alleviate the intra-occlusion problem, boosting the performance of occluded objects and achieving best results on pedestrian detection datasets under the traffic scene. 4. Proposing a joint head and human detector JointDet. This paper designs a joint detection framework using the contextual information between head and human, which can not only suppress common false positives in head detection, but also recall the missed detections in human detection, improving the performance of head and human detection under severe occlusion. 5. Presenting a single-shot refinement object detector RefineDet. It utilizes the upper half of FPN to conduct the first stage detection and uses the lower half of FPN to perform the second stage detection, achieving the accuracy of two-stage detectors as well as maintain the speed of one-stage detectors, i.e., a better balance between accuracy and speed. 6. Introducing an adaptive training sample selection method ATSS. This paper designs a strategy to automatically select positive and negative training samples according to statistical characteristics of objects, increasing the detection performance without introducing any overhead and hyperparameters.
关键词	物体检测，锚框机制，设计匹配，关系学习，高效预测，小尺度物体，遮挡物体，精度速度平衡
语种	中文
七大方向——子方向分类	图像视频处理与分析
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/39036
专题	毕业生_博士学位论文
通讯作者	张士峰
推荐引用方式 GB/T 7714	张士峰. 物体检测中锚框机制的研究[D]. 中国科学院自动化研究所. 中国科学院大学,2020.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
物体检测中锚框机制的研究-最终提交版.p（14665KB）	学位论文		限制开放	CC BY-NC-SA