复杂背景下的文字检测、抽取和识别研究

CASIA OpenIR > 毕业生 > 博士学位论文

	复杂背景下的文字检测、抽取和识别研究
其他题名	Text Detection, Extraction and Recognition in Complex Background
	徐磊
	2007-06-06
学位类型	工学博士
中文摘要	让计算机自动理解图像、视频等多媒体文档的内容，并且利用得到的信息去推动更多的应用，已经成为研究的热点。相比颜色、形状、纹理等其它图像信息而言，图像和视频中嵌入的文字通常直接和图像内容相关，如果能够检测、抽取并识别出图像中的文字，便能够为图像和视频的内容理解提供一些关键信息。传统的OCR技术能够有效处理高质量的扫描文档，但是当其面对具有复杂背景的图像和视频时，会遇到很多困难，导致性能下降。因此，需要从理论和技术上提供有效的解决方案。本文围绕着复杂背景下的文字检测、抽取和识别这一领域，针对其中若干相关问题开展了研究工作，主要内容包括： 1.分别研究了基于边缘、纹理和颜色的静止图像文字检测方法。在基于边缘的方法中，通过高性能的彩色边缘算子和连通域分析算法来检测文字区域。在基于纹理的方法中，首先利用LBP特征和$\chi^2$距离来构建相应的最近邻分类器，然后结合金字塔策略对图像进行纹理分割，进而得到检测结果。在基于颜色的方法中，提出了基于自适应SOM的颜色聚类方法，在聚类得到的各个子图中分别进行文字检测。最后提出了融合多种特征的文字检测方法。通过边缘、纹理、连通域和颜色等多种特征的互补，提高了文字检测率。 2.根据视频文字的特点，提出了由视频文字粗检测、文本块精加工、纹理验证、多帧验证、多帧文字增强、基于连通域的二值化以及文本跟踪等多个模块组成的视频文字检测和抽取系统。在检测环节中，首先采用了边缘密度特征和金字塔策略进行粗检测，通过较弱的规则来保障较高的召回率。随后通过多级验证机制来对误检结果进行排除。在抽取环节中，首先提供了准确的文本极性判断方法，在此基础上通过多帧融合来实现文字增强，并结合连通域的相关信息来提升二值化的效果。最后给出了文字跟踪算法。实验结果表明了该系统的有效性。 3.针对大类别集上的分类问题，提出了一种新的快速分类策略，能够兼顾识别率和识别速度。整体上，采用多级分类策略，通过引入冗余的分组候选规则，实现了固定的类别分组。对于任何未知样本来说，其候选集就是最邻近的组，而组的个数是有限的，每个组都可以视为一个独立的小类别分类问题。此时，可以对各个组采用更加灵活的分类器设计策略，包括分类器整合和分类器选择等等。另外，还提供了用多级LVQ来训练分类器来对全局模板、各个组的模板以及组的中心进行优化的算法。
英文摘要	Nowadays the amount of digital images and videos increases explosively. These multimedia documents contain a great deal of information which is valuable for many applications, such as content-based video retrieval. However, it is very difficult for computers to automatically extract such information. The texts embedded in images and videos are highly related to the current content, and as a result, they can provide key clue for image understanding. The traditional OCR software can process scanned images effectively, however, it will encounter great difficulties when there are complex backgrounds. Therefore, it is an urgent and challenging task to develop a framework which can detect, extract and recognize texts from complex backgrounds effectively. Aiming at this goal, the following research work has been conducted. 1.The edge-based, texture-based and color-based text detection algorithms are provided respectively in this dissertation. The edge-based one utilizes color edge detector and connected components analysis to search for the text regions. The texture-based one employs local binary pattern for texture description and then constructs a NN classifier for texture segmentation. The color-based one utilizes an adaptive SOM for color reduction, and then the text regions can be detected in each color plane. Finally, we propose a hybrid detection strategy which combines multiple features, i.e. edge, color and texture, to achieve satisfying performance. 2.We construct a framework for video text detection and extraction, which contains 7 function modules. In the detection stage, the edge density feature and pyramid strategy are utilized for coarse localization. Some weak rules are set up such that the a high recall rate is achieved. Then a multilevel verification strategy is adopted to eliminate the false alarms and improve the precision rate. In the extraction stage, multiple frames containing the same text are integrated based on a precise polarity estimation algorithm, so that the contrast between the text and the background can be enhanced. A novel binarization algorithm, which utilizes both intensity information and local CC-based geometrical information, is also proposed to improve the recognition rate. 3.We propose a fast classification strategy for large class sets. The group-based candidate selection rule is firstly introduced and the whole class set is divided into several groups. The adjacent groups overlap each other, so that high hit rate can be guaranteed. For any unknown sample, the candidate set is its nearest group. We utilize a hierarchical learning vector quantization to optimize the global prototypes, local prototypes and group centroids. Furthermore, we introduce the risk-zone criterion to improve the hit rate of the samples which are located near the group boundaries.
关键词	图像文字检测视频文字检测视频文字抽取大类别集快速分类 Image Text Detection Video Text Detection Video Text Extraction Large Class Sets Fast Classification Strategy
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6005
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	徐磊. 复杂背景下的文字检测、抽取和识别研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2007.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20041801462800（6380KB）			暂不开放	CC BY-NC-SA