基于数码影像的文字识别技术中若干问题研究

CASIA OpenIR > 毕业生 > 博士学位论文

	基于数码影像的文字识别技术中若干问题研究
其他题名	Study on some issues in camera-based character recognition
	马懿超
	2007-06-06
学位类型	工学博士
中文摘要	近年来，随着高分辨率数码影像设备普及率的提高，将数码影像设备作为文字图像的获取工具相对于扫描仪具有更大的优势, 如携带方便，操作简单，可以实现无接触获取图像等。因此，OCR领域开始关注将传统的基于扫描仪的文字识别系统移植到数码影像设备上来。这对于OCR领域既是新的机遇也面临着很大的挑战。数码影像设备与扫描仪设备成像机理的差异和处理对象的复杂性，使得传统的基于扫描图像的文字处理软件并不能完全适用于基于数码影像的文字处理中。如数码相机获取的文档图像经常会发生一些变形，如透视变形，弯曲变形等。而且数码相机拍摄的场景图像中也往往具有复杂的背景，文字嵌于背景中。这些因素在基于扫描仪的传统OCR技术中没有或很少考虑到，严重影响了OCR技术在数码影像中的效果。本文围绕着数码影像的文字识别技术中一些亟待解决的问题展开研究，本文的主要工作包括： 1.针对文档图像的透视变形问题，提出一种集成的小型文档图像透视变形校正算法。考虑到小型文档的特点——面积小，文字数少，版面较复杂，采用提取小型文档的外边界直线并结合文档内部的文字信息进行校正。其中在文档外边界直线的检测方面，提出多特征集成的检测方法。这种集成的校正方法可以充分地利用图像中文档的结构信息，达到有效复原小型文档图像的目的。 2.针对书籍文档图像的弯曲变形问题，提出一种基于文字行曲线拟合及图像卷绕的复原算法，用于单幅书籍文档图像的弯曲变形校正。鉴于在这种校正算法中，文字行曲线的拟合对于图像的正确校正至关重要，提出一种基于图模型的局部最优文字行曲线检测算法。并根据检测出的文字行曲线，提出基于文字行曲线局域连续性的曲线过滤方法，修正检测出的曲线，利用图像卷绕的方法进行图像复原。这种校正方法具有抗文档变形类型、文字行弯曲程度能力强的特点。 3.针对自然场景图像等复杂背景下的文字检测问题，提出一种基于文字分布特征的文字串检测算法。利用文字串与其它物体相区别的关键因素——文字串的分布信息，提出能描述文字串横向分布特征的条带特征族。并根据文字检测任务的特点，提出偏重正面样本的AdaBoost算法作为特征选择和分类器构建的学习机制。在后期检测中，根据文字串的竖向分布信息，利用投影分析和连通域分析的方法进行文字串的精检测。该方法能够有效检测多种情况下的文字串区域，得到了较高的检测精度。 4.提出针对特定应用要求的文字检测算法性能评价指标。包括：（1）提出一组针对识别任务的评价指标，该组指标客观描述了待评测算法的检测特点，可以给评测人和设计者提供该算法在不同要求下的优缺点，便于算法的比较和选择；（2）提出一组针对检测任务的评价指标，该组指标给出待评测算法对于给定基准区域的检测性能，该组指标独立于基准区域的面积大小，并引入误检区域面积率的指标，以更细致地描述文字检测算法。
英文摘要	In the past few years, the digital imaging devices, such as cameras become more and more popular. Cameras have advantages over scanners on capturing images of texts, such as more portable, more convenient, non-contact etc. But the difference on imaging mechanism between cameras and scanners, and the more complex objects they are handling make the traditional OCR software cannot be used in the texts captured by cameras unchangingly. Some factors are not considered or seldom considered in the traditional scanner-based OCR technology. They seriously affect the OCR performance in camera-based text images. This dissertation takes a study on camera-based character recognition technologies, the major contributions are: 1. We propose an integrated perspective distortion correction method for small-square document images. It uses the characters of small-square documents: small area, fewer words and complicate layout. The boundaries of small-square document region as well as the texts information in the document are used to correct the distortion. In the boundary detection part, multiple features are used to detect them correctly. 2. We propose a text curves’ fitting and one-image based method to correct this curved distortion in bound document images. In this kind of correction method, it is the crucial step to locate the curved text lines automatically and accurately. We propose a graph based locally optimized text curves detection method. This algorithm is robust in document distortion type and the curved extent of text lines. After the text curves are detected, the local continuation of text curves are proposed to use as filtering strategy to revise the detected curves. Afterwards, image warping methods are used to correct the image. 3. We propose a text string detection method based on text distribution information. This algorithm utilizes the distribution information, which is the crucial difference between text strings and other objects. A group of stripe features are proposed, which represent the horizontal distribution information of text strings. And in consideration of the task of text detection, one positive data biased AdaBoost algorithm is proposed as feature selection and classifier construction mechanism. In the later process of detection, text strings’ vertical distribution information is used to further filter out non-text regions and make localization more precise. 4. New methods to evaluate the performance of text localization algorithms in different usage conditions are proposed. It includes: 1) One set of metrics in recognition usage are proposed, this set of metrics gives a description of the detection method. 2) One set of metrics in detection usage are proposed, this set of metrics gives a recall and precision evaluation on the object level, and the scores are independent of the sizes of the ground truths. And to describe the text detection method in detail, the sizes of false alarm regions are also considered.
关键词	文档图像透视变形校正弯曲变形校正文字检测 Document Image Perspective Distortion Curved Distortion Text Detection
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6004
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	马懿超. 基于数码影像的文字识别技术中若干问题研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2007.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20041801462800（2765KB）			暂不开放	CC BY-NC-SA