Image segmentation is one of the most fundamental tools to construct various vision-based application systems. It has been widely used in many fields, such as vision-aided industrial inspection, visual object recognition for military applications, medical image processing, and image/video content analysis for entertainment and public security, etc. Image segmentation has drawn extensive attention from the academic and industrial circles. There have been a large number of valuable researches on this task over the past few decades. However, due to the complexity of the visual modeling and the ambiguity of pattern grouping, most existing methods lack the generality and scalability for various image segmentation problems, as well as the semantic correlation for object segmentation. To address the above issues, we employ the supervised information to guide the segmentation. From a classification perspective, however, “in which way to supply” and “how to exploit” the supervised information are fundamental issues. Such cases are worthy of being further studied to develop advanced image segmentation methods, that is the core motivation of our thesis. By reviewing the state-of-the-art methods, we propose an exemplar based image segmentation framework. Given a segmented example, our goal is to automatically segment new similar images from the following two aspects: (1) how to explore and exploit the supervised information hidden in the exemplar; (2) how to transfer the supervised information to the new images. The main contributions of this thesis can be highlighted as follows: 1. We propose a new framework for the image segmentation, which is constructed on exemplar based contextual sparse representation. This framework is addressed as a supervised classification problem from two stages: dictionary construction and segmentation transferring. In the first stage, there are two sub-stages: (1) the exemplary image is over-segmented into superpixels; (2) the superpixels of foreground and background are utilized respectively to construct two contextual dictionaries. The above treatments would facilitate the representation ability as well as the discriminative power of the dictionaries. In the second stage, the two contextual dictionaries are concatenated together, and then employed to reconstruct the superpixels of the new image by way of sparse representation. The reconstruction errors are further treated as the likelihood energy, which will be finally integrated...
修改评论