|英文摘要||Nowadays the amount of digital images and videos increases explosively with
the development of high technology. And these multimedia documents contain a
great deal of information which is valuable for many applications，such as infor-
mation retrieval, image classi¯cation, data mining and etc. However, it is still very
di±cult for computers to understand the contents of images and videos. Text in-
formation embedded in images and videos is highly related to current content, and
consequently it can provide key clue for image and video understanding. There-
fore, it is imperative and challenging to develop a system that can detect, extract
and recognize text information from images with complex background e®ectively.
Aiming at this goal, the following research work has been conducted.
Firstly, to detect text from images with di®erent background, an adaptive
text detection method based on image complexity analysis is proposed. Before
text detection, this approach adopts an image complexity analysis step to classify
image complexity into three categories: low complexity, middle complexity and
high complexity. Then images with di®erent complexity adopt di®erent methods
to extract edge features. For images with low complexity, the simple sobel oper-
ator is adopted, then a ¯lter is convolved with the edge image to further remove
noises and enhance text edges. And for image with middle complexity, after edge
detection with sobel operator, an edge repair method is used to connect broken
edges. And for images with high complexity, a morphology based edge detection
operator is designed which can remove most of noises while keeping text edges.
The proposed text detection method takes the coarse to ¯ne detection strategy
which combines the edge-based method, connected component based method and
the texture based method into a framework. Experimental results verify the per-
formance of the proposed method.
Secondly, to extract text from images with complex background, a text ex-
traction approach based on conditional random ¯eld is put forward, which can
integrate local image features and context information into one framework. Lo-
cal information and context information is modeled by state feature function and
transition feature function respectively. This paper compares the extraction per-
formance in di®erent color space and also compares the performance of di®erent
features. Experimental results demonstrate its performance.
Thirdly, the standard form of conditional random ¯eld is e®ective in mod-
eling local information. Using local information sometimes can be good enough
to identify pixel class. However, it becomes insu±cient in some ambiguous cases.
Therefore, a method combining multi-layer context information is proposed, in
which local image information, local context information and contextual label
information are integrated into a single conditional random ¯eld. Local image in-
formation is modeled to predict the category within image sites; while contextual
label information is modeled to determine the patterns within label ¯eld. This
paper compares the proposed method with other text extraction method. Com-
paring results demonstrate its superiority over other methods on text extraction
especially in complex background.|