自然场景中文字检测与识别方法研究

CASIA OpenIR > 毕业生 > 博士学位论文

	自然场景中文字检测与识别方法研究
其他题名	Text detection and recognition in natural scenes
	史存召
	2014-05-29
学位类型	工学博士
中文摘要	随着移动互联网的迅猛发展及可拍照智能终端的广泛普及，自动理解用户拍摄的图像或视频中的高层语义信息具有巨大的应用前景，而图像中的文字直接携带了语义信息，因此数字图像中的文字自动检测与识别技术得到了国内外研究者的广泛关注。现阶段针对扫描文档的文字识别技术已经日趋成熟；然而，由于自然场景中的文字的位置、尺寸、字体、光照、视角、形变的多变性以及背景的复杂性，自然场景中的文字检测及识别仍存在诸多需要攻克的技术难点。针对场景文字的类内多变性及背景的不确定性，以及场景文字识别问题本身的复杂性和交叉性，本文借鉴图像处理、目标检测、模式分类、机器学习等领域的最新进展，对场景文字识别所涉及到的文本检测、抽取以及识别等子问题分别展开了一系列的研究，并且在对以上子问题研究的基础上提出了集检测与识别为一体的场景文字识别方法。本文的主要工作和贡献包括以下内容： 1. 由于自然场景中文字、背景的多变性以及训练样本的有限性，导致一种信息或者一个分类器不能很好地区分文字/非文字区域。针对以上问题，本文提出了基于图模型的场景文字检测方法，在图模型的框架下结合上下文融入多种信息来提高文字检测的性能。本文首先提出一种基于图模型的背景抑制方法，该方法把像素点视为图的节点，将区域分类器结果、颜色和梯度信息融合到图模型的损失函数中，通过最小化损失函数得到最优的背景抑制效果。实验结果证明此方法优于其他的预处理方法。为了更加充分地利用上下文信息，本文提出了建立在极大稳定性极值区域上的图模型的场景文字检测方法，把极大稳定性区域视为图模型的节点，融合多种信息到一个框架中，进而使各种信息相互补充。该方法对尺度不敏感，同时由于考虑了上下文信息，自适应性较强。实验结果表明本方法取得了较好的文字/非文字极值区域分类效果和整体的文字检测性能。 2. 针对适用于扫描文档的二值化方法在背景复杂的文本块图像上会失效的问题，本文提出了基于图割的自适应复杂背景文本图像抽取方法。针对复杂背景文本块图像背景不均匀带来的噪声，本文提出了先分后合的方法，将文本块图像粗分为若干子图，然后在各个子图上分别处理；针对文字特有的笔画信息，本文设计算法自动为图割提供置信度较高的前景及背景点作为硬约束，结合软约束利用图割算法将硬约束扩展到整个子图，以实现文字笔画与背景分割的目的。在视频文本图像上的实验结果验证了本方法对于分割复杂背景文本块图像的有效性。 3. 为了利用文字特有的结构信息，本文提出了两种融入结构信息的场景单字识别方法。为了对文字实现基于结构的表述，本文首先提出了基于多尺度图匹配核的场景文字识别方法，该方法将文字表示为基于多尺度网格划分下的无向图，通过图匹配计算两幅图像之间的相似度，而在图匹配过程中利用了文字的结构不变性约束，因此可以应对具有一定形变的场景文字。实验结果验证了本方法的有效性。为了更加直接充分地利用字符特有的结构信息，本文提出了基于结构指导的场景单字识别方法，将每类字符表示为...
英文摘要	With the rapid growth of mobile internet and camera-based applications readily available on smart phones and portable devices, understanding the pictures or videos taken by these devices semantically has many potential applications. Among all the information contained in the image, text, which carries semantic information, could provide valuable cues about the content of the image. Therefore, automatically detecting and recognizing text in natural images have gained increasing attention from the computer vision community. Nowadays the performance of recognizing text from scanned document is quite satisfactory. However, due to the unconstrained position, size, font, illumination, deformation of text and the complexity of background in natural images, scene text detection and recognition are quite challenging and the performance is far from satisfactory. Considering intra-class variations of scene characters and the uncertainty of background, as well as the complexity and cross-cutting nature of scene text recognition, this dissertation presents an in-depth study on the individual sub-problems of scene text recognition--text detection, text extraction and text recognition, by combining latest advances in image processing, object detection, pattern classification and machine learning. Moreover, based on the research on the individual tasks, we propose an end-to-end scene text recognition framework. The main contributions of this dissertation are summarized as follows: 1. Due to the high degree of intra-class variations of scene characters as well as the limited number of training samples, single information source or classifier is not enough to segment text from non-text background. To cope with the problems mentioned above, we propose two scene text detection methods using graph model to incorporate various information sources into one framework so as to improve the performance of text detection. Firstly, we propose a graph-based background suppression method for scene text detection. Considering each pixel as a node in the graph, our approach incorporates region-based classification result, color and gradient information into the cost function, which is optimized to get the best background suppression result. Experimental results demonstrate the superiority of our method over other preprocessing methods. To make full use of the contextual information, we propose a novel scene text detection approach using graph model built upon Maximally Stable ...
关键词	文字检测图模型图割文字识别结构信息文本识别树结构模型 Text Detection Graph Model Graph Cut Character Recognition Structure Information Text Recognition Tree-structured Model
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6640
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	史存召. 自然场景中文字检测与识别方法研究[D]. 中国科学院自动化研究所. 中国科学院大学,2014.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20111801462805（14614KB）			暂不开放	CC BY-NC-SA