英文摘要 | Object detection is a computer vision task to find ``which objects are where'' in image or video. Object Detection has been a central problem in high level computer vision and also serves as the bases for other problems such as image search, face recognition, tracking and action recognition. The research on object detection also promotes the low level and middle level computer vision, such as feature representation. Considering the information loss when the 3D object in real world is projected to the 2D image or video and the system error and random error introduced by the sensor, as well as appearance variations from the category, pose, deformation, illumination and occlusion, object detection is very challenging. Meanwhile, the improvement from data, computing infrastructure and machine learning also provides many opportunities for object detection. The deformable part model has been a very popular structural model in object detection. It uses the star model to connect the root template and deformable part template to represent the object. Deformable part model has largely improved the object detection, and a lot of performance gain is achieved based on deformable part model. In this paper, we improve the deformable part model in representation, learning, inference and post-processing. Additionally, we propose the superpixel labeling based method for object detection, which connects the deep learning and structural learning. The contributions of the paper are listed as follows. 1. For representation, we extend the parametric model to be a joint parametric and non-parametric model, in order to capture large deformable of objects in real world. Specifically, we propose a shape regression based deformable part model and its stacked version. 2. For parameter learning, we propose a multi-task deformable part model, in order to handle samples from different distribution. Specifically, we jointly learn the distribution aware feature transform and shared detector in the distribution invariant space to handle samples from different resolutions. 3. For inference, we find three bottlenecks and significantly accelerate it in the three aspects. Specifically, we propose discriminative low rank filter learning, neighborhood aware cascade and lookup table based HOG feature computation. 4. For post-processing, we use the global image context, in order to find the most consistent detection hypothesis with the scene. Specifically, we propose to model the spat... |
修改评论