The advent of mobile devices and media cloud services has led to the unprecedented growing of multimedia data, including texts, images and videos. Because of the high visual representativeness, image data has drawn great attention from both the industrial and research areas. However, the great number of images poses grand challenges to the traditional image analysis and management, which is often conducted by humans. Therefore, how to manage image content in an automatical way has been a heated research topic in the areas of both multimedia analysis and computer vision. In this paper, we conduct research on large-scale image retrieval in terms of its theory and application, based on computer vision techniques. To map low-level visual features to high-level semantic concepts, we propose a holistic hierarchical learning framework. On one hand, we present how to transform and store the unstructured image data in the real world to the low-level visual features in computer vision. On the other, we show how to map the low-level features to the high-level semantic concepts that can be understood by humans. Meanwhile we discuss efficient indexing techniques for fast image retrieval, and finally provide an effective way for data accessing and managing. The main contributions of this paper are listed as follows: 1. Effective logo modeling and retrieval based on spatial constraints of visual features. In this section, we propose to leverage the geometric relationships among different types of features as an effective spatial constraint to model logo images. Specifically, to build robust visual description, we propose to fuse spatial-related local features by adaptive weighting, and thus we can find the most representative feature combination to different logo images. Extensive experiments show that this approach can significantly reduce the feature mismatch and improve the accuracy for both image retrieval and recognition. 2. Image categorization based on middle-level attributes and structure learning. In this section, we propose an unsupervised approach to learn the visual attributes that characterize an object class. Specifically, to ensure the learned visual attributes to be visually recognizable and representative, in contrast to manually constructed attributes, we adopt a joint spectral clustering with a sparse feature selection scheme. Extensive experiments show that this approach not only learns clean and intuitive attributes of object classes, but also ac...
修改评论