|关键词||合成文本检测 局部对比度分割 多方向文本行提取 条件随机场 版面分割 背景矩形分析|
The multimedia data, including texts, image and video, is increasing rapidly on the Internet and mobile network. Images with embedded texts are in considerable proportion in network media data. Therefore, reading the texts will help to better understand the image contents. However, the automatic text reading in born-digital images is still a challenging task and has inspired great interests in both academia and industry.
A text information extraction system consists of three parts: text detection, page segmentation and text recognition. Born-digital text detection and page segmentation face a series of challenges due to cluttered background, variations of color, multilingual texts, mixed texts and graphics, and complex layouts. In this thesis, we present an in-depth study on born-digital text detection and page segmentation by combining techniques in image processing, pattern recognition and probabilistic graphical model. Experimental results on several public datasets demonstrate the effectiveness and superiority of the proposed methods. The contributions of this dissertation are summarized as follows:
- We propose a born-digital text detection method by local contrast-based segmentation, which takes full advantage of the characteristics of born-digital text in web images. The proposed method first extracts candidate text connected components (CCs), then applies text/non-text CC classification to filter non-text CCs. Subsequently, text CCs are then grouped into text lines based on heuristic rules. At last, non-viable text lines are filtered by text line verification. We detect text contours and stroke interior regions separately and combine them to extract candidate text CCs. First the image is segmented into non-smooth and smooth regions based on local contrast thresholding. Text contour pixels and non-text contour pixels in non-smooth regions are detached using local binarization. Fortunately, stroke interior regions correspond to smooth regions directly. Experiments on public datasets show that the proposed method performs comparably well with the best existing methods.
-We propose a conditional random filed (CRF)-based multi-oriented text line extraction method. We adopt a strategy which groups CCs first and then filter non-text CCs to avoid mis-filtering text CCs at the very beginning. A minimum spanning tree (MST) is first acquired by linking adjacent nodes. Then each edge is assigned a weight based on a coarse-to-fine scheme. The weight represents the belief that two nodes belong to the same line. Non-text and text nodes in the MST are identified with a CRF model for text/non-text CC classification. At last, lines are acquired trivially with the node labels and edge weights, and non-text lines are filtered based on text/non-text line classification. Experimental comparison with the local contrast-based segmentation method demonstrates the efficiency of the proposed method.
-We propose a background rectangles analysis-based page segmentation method. Most existing methods only utilize foreground or background information. Instead, the proposed method considers both foreground and background information. Text lines and non-text CCs are first analyzed separately and then combined to acquire the segmentation results. As for text lines, background rectangles are first extracted from the gap between horizontally neighboring text CCs in the same text line. Then heuristic rules and MLP are adopted progressively to filter within-block rectangles. The remaining between-block rectangles are grouped into separators, which segment text regions into blocks. As for non-text CCs, small CCs and CCs overlapping with text blocks are filtered.
|陈凯. 网络图像中合成文本检测及版面分割方法研究[D]. 北京. 中国科学院研究生院,2016.|