Texts in natural scene images convey rich semantic information. Thus, scene text extraction technology has potential needs in numerous applications. However, due to the diversity of text appearance, complex background and low imaging quality in natural scene images, extracting text from natural scene images is a very challenging problem. Scene text extraction technology involves two sub-tasks: text detection and text recognition, which are the main objective of research in this thesis. The contributions of this dissertation are summarized as follows:
A scene text detection method with superpixel based character candidate extraction is proposed. Different from representative character candidate extraction methods based on extremal regions, the proposed superpixel based method fuses color and edge information to segment scene text image into superpixels by taking advantage of color consistency and edge visibility of characters, and then extracts character candidates through hierarchical clustering. In addition, we design a convolutional neural network based text/non-text classifier, which can use the contextual information of character candidate region and combines with double threshold strategy for character candidate filtering. The experimental results on public datasets show that the performance of the proposed scene text detection system is superior to previous representative connected components based methods.
A memory-augmented attention network for scene text recognition is proposed. Most of previous attention based scene text recognition methods adopted standard attention network as the decoder and did not make full use of character information before last time step and alignment information at all historical time steps when decoding the character at current time step. To address this problem, the proposed memory-augmented attention network (MAAN) performs memory augmentation on standard attention network in two aspects: memory augmentation of historical character information and memory augmentation of historical alignment information. The experimental results on public datasets show that the performance of the proposed MAAN is superior to standard attention network and is comparable or superior compared with previous state-of-the-art methods.
An attention network with gated embedding for scene text recognition is proposed. Standard attention network relies on the embedding vector of character at last time step overmuch when decoding the character at current time step. However, the source of previous character embedding vector is different in training phase and test phase. To address this problem, the proposed attention network with gated embedding (GEAN) adaptively resets the input information from previous character embedding vector through adding an adaptive embedding gate, which is constructed based on the degree of correlation between the hidden state vector and the embedding vector of the corresponding character label at the same time step. The experimental results on public datasets show that the recognition performance of the proposed GEAN is superior to standard attention network.
A multi-branch guided attention network for irregular text recognition is proposed. The proposed method provides a simple but effective way to deal with multiple types of irregularity in irregular text images simultaneously. Through mutual guidance among multi-branch data in training, the proposed multi-branch guided attention network (MBAN) can learn invariant semantic representation of predicted character sequences between regular text images and the corresponding irregular images and alleviate the attention drift problem often encountered by standard attention network, in the sense that the accuracy of alignment factors at each time step is significantly improved. The experiments on public datasets verify the effectiveness of the proposed MBAN in recognizing irregular text and alleviating the attention drift problem. And the performance of MBAN is shown to be comparable or superior compared with previous state-of-the-art irregular text recognition methods.
修改评论