英文摘要 | With the popularity of digital image and video capturing devices (such as digital cameras, digital video camcorders, smart phones and tablet PCs) and the increasingly close connection between people's living and network (via, for example, micro-blog, WeChat and shopping sites), more and more images and videos are generated and transmitted everyday. As a kind of high-level semantic content, text in images and videos has inspired great interests, since it is not only very useful for describing the contents of an image or video, but also can be easily extracted and understood than other semantic information by computers. As a measure to achieve the content-based image/video retrieval, classification, recommendation, filtering, etc., text detection, extraction and recognition in images and videos have been receiving increasing attention in recent years. With this aim, this dissertation presents an in-depth study on text detection and extraction by combining techniques in image processing, pattern recognition and machine learning. Compared to existing text detection methods, the proposed methods have obvious advantages in terms of precision and recall. Some of our techniques have been applied to pratical text information extraction systems. The contributions of this dissertation are summarized as follows. (1) We propose an efficient scene text localization method using gradient local correlation and coarse-to-fine strategy. In the coarse stage, the gradient local correlation is used to characterize the density of pair-wise edges with opposite gradients and consistent stroke width. From the text confidence map calculated from the gradient local correlation, we obtain the candidate test regions by image segmentation and connected components analysis. In the fine stage, the candidate regions are filtered and refined by text line classification and word segmentation. In our experiments on a public dataset, the proposed method was shown to outperform existing methods in terms of both recall and precision. (2) We propose a scene text segmentation method based on seed points and semi-supervised learning. After the estimation of text polarity and stroke width using gradient local correlation, the points in the middle of stroke edge pairs satisfying the width and polarity are taken as foreground seeds, and the points in the middle of the edge pairs with opposite polarity are taken as background seeds. The whole image is then segmented into text and background based ... |
修改评论