With the wide use of intelligent terminals which can take photos and the rapid development of internet, the number of available images and videos has seen a rapid growth. If computers can automatically understand the high-level semantic information contained in images and videos, then computers can help people managing and using the numerous images and videos efficiently and effectively. Scene text is one important carrier of image semantic information. In recent years, detection and recognition of scene image text is drawing more and more attention. This thesis takes scene text characteristics into full consideration, starts from the perspective of feature representation, conducts a thorough study on scene text detection and recognition, the main contributions are as follows: 1. Due to the feature distribution gap between scene text training and testing samples, a scene text detector can not guarantee its performance even already introducing a lot of training samples in the training stage. To deal with this issue, this thesis proposes a new scene text detection method based on transferring cascade classifier. This method refers to the idea of transfer learning and starts from the perspective of features. We think discriminative power of every feature is highly scene related and adjust feature classification weight online to detect scene text adaptively. Specifically, we choose cascade Adaboost as scene text detector, provide as many features as possible for weak classifiers and modify weak classifiers’ weights according to their abilities to classify high confidence testing samples. Then our method is able to detect scene text adaptively. Experiments on public datasets demonstrate the effectiveness of transfer detecting. 2. In order to introduce local strokes and global structure into feature representation of scene text, we propose a new feature representation method based on discriminative stroke bank and use maximal local response values of multi-scales stroke detectors as features. This feature representation method overcomes single scale and poor discrimination power of selected strokes of the past stroke-based methods. We collect stroke training samples based on keypoints labeling of scene character images and slide stroke detectors in the positive strokes’ positions to gain maximal local response values. Introducing response regions can reduce computation time, highlight strokes’ locality and strengthen the discrimination power of feature...
修改评论