英文摘要 |
Object detection is a technique that uses the computer to find objects of specific categories on the image. From the research perspective, object detection is the first step of many high-level vision tasks, and its performance has a significant impact on subsequent tasks. From the application perspective, object detection has a wide range of needs in daily life, it contributes to many practical applications as the core technology. Therefore, the research of object detection has important theoretical significance and application value.
With the advent of deep learning, object detection has been dominated by anchor-based methods. They first tile a large number of manually designed anchor boxes on the image, then match these anchor boxes to objects, next classify these anchor boxes and refine their positions, finally output these refined anchor boxes with confidence scores as final detection results. Thus, the anchor box is the core of this type of detector. This paper focuses on the anchor box mechanism in object detection, and makes in-depth explorations about anchor box in designing and matching, relationship learning and efficient prediction. The goal is to extend and improve anchor-based detectors by solving their existing issues, including small object detection, occluded object detection and accuracy/speed balance. The main contributions of this paper are as follows:
1. Proposing a high-accuracy face detector SFDet. To solve the unfairness issue in small face detection, this paper introduces an effective receptive field theory and an equal density principle to design anchor boxes, and presents a scale compensation operation to match anchor boxes. The proposed method treats faces of different scales fairly during training and achieves state-of-the-art performances on common face detection datasets.
2. Presenting a high-efficiency face detector FaceBoxes. A lightweight backbone network is designed to achieve CPU real-time speed, an anchor densification strategy and a divide-and-conquer strategy are introduced to ensure high performance, so as to reach good balance between speed and accuracy when detecting small faces.
3. Introducing an occlusion-aware pedestrian detector OR-CNN. It designs a part occlusion-aware region of interest pooling unit based on the structural information of pedestrian to alleviate the inter-occlusion problem, proposes an aggregation loss function according to the belonging relationship of anchor boxes to alleviate the intra-occlusion problem, boosting the performance of occluded objects and achieving best results on pedestrian detection datasets under the traffic scene.
4. Proposing a joint head and human detector JointDet. This paper designs a joint detection framework using the contextual information between head and human, which can not only suppress common false positives in head detection, but also recall the missed detections in human detection, improving the performance of head and human detection under severe occlusion.
5. Presenting a single-shot refinement object detector RefineDet. It utilizes the upper half of FPN to conduct the first stage detection and uses the lower half of FPN to perform the second stage detection, achieving the accuracy of two-stage detectors as well as maintain the speed of one-stage detectors, i.e., a better balance between accuracy and speed.
6. Introducing an adaptive training sample selection method ATSS. This paper designs a strategy to automatically select positive and negative training samples according to statistical characteristics of objects, increasing the detection performance without introducing any overhead and hyperparameters.
|
修改评论