Image recognition is one of the core branches of computer vision. It integrates the techniques of digital image processing, pattern recognition and machine learning, and is the foundation of image retrieval, annotation, human-computer interaction, intelligent video surveillance, etc. Feature representation is a key and difficult issue for image recognition. This thesis focuses on the task of natural scene image classification and object recognition, and studies into feature representation for image recognition from the viewpoint of compactness, semantics and discrimination. Our efforts and contributions are as follows. A compact spatial feature representation method for image classification is proposed. Traditional Bag-of-Visual-Words(BoVW) usually resorts to Spatial Pyramid Matching(SPM) to exploit spatial information of image in the pooling step. However, the dimensionality of SPM increases exponentially with the resolution of image division. This paper proposes an alternative framework of SPM for describing spatial information. By directly concatenating the spatial descriptor into the appearance descriptor, the ordinary BoVW model can exploit multi-scale spatial information efficiently. We design three spatial descriptors, and interpret why SPM suffers from redundancy of information from the view of multiple codebooks learning. Experimental results on two public image datasets show that the proposed model is more compact and discriminative compared to SPM. An adaptive pooling method based on Boosting for image classification is proposed. Most of traditional pooling methods are heuristic. Our adaptive pooling method parameterizes pooling as a matrix and models it along with a discriminative classifier. In this way, more flexible pooling function can be learned (other than average pooling and max pooling) and more various spatial layout can be extracted (rather than regular rectangle regions). Different from the existing methods, our method allows category-specific pooling to further enhance the discriminative power. Experimental results on three scene datasets demonstrate the effectiveness of our method. A multi-class learning method for CNN-based feature pooling is proposed. Recent works show that the CNN model trained on sufficiently large and diverse datasets such as ImageNet can be successfully transferred to other visual recognition tasks with a limited amount of training dataset. However, to take the output of full-connected layer of CNN as th...
修改评论