英文摘要 | Nowadays the amount of digital images and videos increases explosively. These multimedia documents contain a great deal of information which is valuable for many applications, such as content-based video retrieval. However, it is very difficult for computers to automatically extract such information. The texts embedded in images and videos are highly related to the current content, and as a result, they can provide key clue for image understanding. The traditional OCR software can process scanned images effectively, however, it will encounter great difficulties when there are complex backgrounds. Therefore, it is an urgent and challenging task to develop a framework which can detect, extract and recognize texts from complex backgrounds effectively. Aiming at this goal, the following research work has been conducted. 1.The edge-based, texture-based and color-based text detection algorithms are provided respectively in this dissertation. The edge-based one utilizes color edge detector and connected components analysis to search for the text regions. The texture-based one employs local binary pattern for texture description and then constructs a NN classifier for texture segmentation. The color-based one utilizes an adaptive SOM for color reduction, and then the text regions can be detected in each color plane. Finally, we propose a hybrid detection strategy which combines multiple features, i.e. edge, color and texture, to achieve satisfying performance. 2.We construct a framework for video text detection and extraction, which contains 7 function modules. In the detection stage, the edge density feature and pyramid strategy are utilized for coarse localization. Some weak rules are set up such that the a high recall rate is achieved. Then a multilevel verification strategy is adopted to eliminate the false alarms and improve the precision rate. In the extraction stage, multiple frames containing the same text are integrated based on a precise polarity estimation algorithm, so that the contrast between the text and the background can be enhanced. A novel binarization algorithm, which utilizes both intensity information and local CC-based geometrical information, is also proposed to improve the recognition rate. 3.We propose a fast classification strategy for large class sets. The group-based candidate selection rule is firstly introduced and the whole class set is divided into several groups. The adjacent groups overlap each other, so that high hit rate can be guaranteed. For any unknown sample, the candidate set is its nearest group. We utilize a hierarchical learning vector quantization to optimize the global prototypes, local prototypes and group centroids. Furthermore, we introduce the risk-zone criterion to improve the hit rate of the samples which are located near the group boundaries. |
修改评论