With the rapid development of computer, multimedia, and network technology, various types of multimedia resources such as images, flashes, and videos have become the mainstream of information exchange. How to effectively manage and use the large number of multimedia resources, and how to quickly find useful data for users from them are the key problems in content-based image retrieval system. Object detection and recognition is one of the most important research works in the field of content-based image retrieval, which has significant application value for smart video surveillance, image and video retrieval, information security and so on. In this thesis, we make an intensive study of object detection and recognition in images. The main contributions of this thesis are listed as follows: Firstly, we present a novel approach to measuring similarity between objects based on matching local “appearance contextual descriptor”, which is robus across a substantial range of lens deformation, non-rigid deformation, local affine deformation, intra-class deformation, etc. The descriptor has two components: histogram of oriented gradient feature representing local patch appearance and the contextual descriptor capturing not only the spatial distribution of the non-reference patches relative to the reference patch but also the appearance similarities between the reference patch and the non-reference patches in the region. We treat recognition in a nearest-neighbor classification framework and match object in regions with no prior learning. We compare our method to commonly used methods and demonstrate its applicability to object matching. Secondly, we propose a new method for object detection that integrates part-based model with cascades of boosted classifiers. The parts are labeled in a supervised manner. For each part, we construct a boosted cascade by selecting the most discriminative features from a large set and combining more complex classifiers. Then we learn a model of the spatial relations between those parts. The experimental results demonstrate that training a cascade of boosted classifiers for each part and adding spatial constraints among parts improve performance of detection and localization. In addition, in order to avoid noise that hand-labeling the training images may add, we learn the part models in a weakly supervised manner, where object labels are provided but part labels are produced by training. The experimental results ...
修改评论