英文摘要 | With the rapid growth of mobile internet and camera-based applications readily available on smart phones and portable devices, understanding the pictures or videos taken by these devices semantically has many potential applications. Among all the information contained in the image, text, which carries semantic information, could provide valuable cues about the content of the image. Therefore, automatically detecting and recognizing text in natural images have gained increasing attention from the computer vision community. Nowadays the performance of recognizing text from scanned document is quite satisfactory. However, due to the unconstrained position, size, font, illumination, deformation of text and the complexity of background in natural images, scene text detection and recognition are quite challenging and the performance is far from satisfactory. Considering intra-class variations of scene characters and the uncertainty of background, as well as the complexity and cross-cutting nature of scene text recognition, this dissertation presents an in-depth study on the individual sub-problems of scene text recognition--text detection, text extraction and text recognition, by combining latest advances in image processing, object detection, pattern classification and machine learning. Moreover, based on the research on the individual tasks, we propose an end-to-end scene text recognition framework. The main contributions of this dissertation are summarized as follows: 1. Due to the high degree of intra-class variations of scene characters as well as the limited number of training samples, single information source or classifier is not enough to segment text from non-text background. To cope with the problems mentioned above, we propose two scene text detection methods using graph model to incorporate various information sources into one framework so as to improve the performance of text detection. Firstly, we propose a graph-based background suppression method for scene text detection. Considering each pixel as a node in the graph, our approach incorporates region-based classification result, color and gradient information into the cost function, which is optimized to get the best background suppression result. Experimental results demonstrate the superiority of our method over other preprocessing methods. To make full use of the contextual information, we propose a novel scene text detection approach using graph model built upon Maximally Stable ... |
修改评论