图像与视频中文本检测与提取方法研究

CASIA OpenIR > 毕业生 > 博士学位论文

	图像与视频中文本检测与提取方法研究
其他题名	Research on Text Detection and Extraction in Images and Videos
	白博
	2014-06-04
学位类型	工学博士
中文摘要	随着数字图像与视频采集设备（例如：数码照相机、数码摄像机、智能手机和平板电脑等）的普及，网络（例如：微博、微信和购物网站等）与人们生活之间关系日益密切，互联网上增图像和视频的数量呈现爆炸式增长。作为一种高级语义信息，图像与视频中的文字由于其自身的优势：（1）与图像或视频的内容高度相关、（2）较其它物体容易提取、（3）其代表的语义信息更容易被计算机理解，越来越受到人们的关注。为了更加准确、高效地自动获取图像与视频中文字所包含的语义信息，实现基于内容的检索、分类、推荐、过滤等功能，利用计算机对图像与视频中的文本进行自动定位、提取与识别成为近些年研究的热点。基于以上研究背景，本文结合图像处理、模式识别、机器学习等相关领域的技术，对图像与视频中文本的定位和提取进行了深入的研究。相比现有方法，本文所提出的方法在精度、召回率等方面具有明显的优势，并在某些领域得到了实际应用。本文的创新性工作概括如下：（1）提出了一种基于局部梯度相关函数的自然场景文本检测方法。该方法按照由粗到精的策略对图像中的文本进行定位。在粗定位阶段，利用局部梯度相关函数，充分考虑文本区域固有的特性（笔画宽度一致性和笔画颜色一致性）,得到文本置信度图，进而通过图像分割、连通部件分类得到文本候选区域。在精定位阶段，通过对文本候选区域的适当扩展、精细分割、文本行分类及分词，得到最终的文本检测结果。在公开数据库上的实验结果表明，本文提出的方法不仅在准确率和召回率方面优于现有方法，而且在文本图像分割中也取得了优异的成绩。（2）提出了一种基于种子点和半监督分割的自然场景文本提取方法。首先利用局部梯度相关函数对文本区域字符极性、笔画宽度等信息进行估计，从而自动生成前景和背景的种子点；再利用种子点提供的颜色与位置信息，采用基于二次判别函数（QDF）的方法和基于最小树割(MTC)的方法对图像进行最终分割。实验表明，这两种方法在精度和召回率都优于现有方法的同时，性能上还具有一定的互补性。（3）提出了一种基于笔画特征的快速视频文本检测与提取的方法。该方法利用视频中字幕区域边缘图像具有高边缘密度、边缘方向多样化、梯度方向相反边缘点成对出现三个特点，快速计算笔画特征，可以实时地对视频中出现的文本进行准确定位。接下来利用一种基于整行打分的方法对检测得到的文本图像进行快速二值化，得到可以用于字符识别的文本二值图像。公开数据库上的实验结果表明，该方法具有准确性和高效性。该方法已成功应用于网络视频内容提取实际应用系统。
英文摘要	With the popularity of digital image and video capturing devices (such as digital cameras, digital video camcorders, smart phones and tablet PCs) and the increasingly close connection between people's living and network (via, for example, micro-blog, WeChat and shopping sites), more and more images and videos are generated and transmitted everyday. As a kind of high-level semantic content, text in images and videos has inspired great interests, since it is not only very useful for describing the contents of an image or video, but also can be easily extracted and understood than other semantic information by computers. As a measure to achieve the content-based image/video retrieval, classification, recommendation, filtering, etc., text detection, extraction and recognition in images and videos have been receiving increasing attention in recent years. With this aim, this dissertation presents an in-depth study on text detection and extraction by combining techniques in image processing, pattern recognition and machine learning. Compared to existing text detection methods, the proposed methods have obvious advantages in terms of precision and recall. Some of our techniques have been applied to pratical text information extraction systems. The contributions of this dissertation are summarized as follows. (1) We propose an efficient scene text localization method using gradient local correlation and coarse-to-fine strategy. In the coarse stage, the gradient local correlation is used to characterize the density of pair-wise edges with opposite gradients and consistent stroke width. From the text confidence map calculated from the gradient local correlation, we obtain the candidate test regions by image segmentation and connected components analysis. In the fine stage, the candidate regions are filtered and refined by text line classification and word segmentation. In our experiments on a public dataset, the proposed method was shown to outperform existing methods in terms of both recall and precision. (2) We propose a scene text segmentation method based on seed points and semi-supervised learning. After the estimation of text polarity and stroke width using gradient local correlation, the points in the middle of stroke edge pairs satisfying the width and polarity are taken as foreground seeds, and the points in the middle of the edge pairs with opposite polarity are taken as background seeds. The whole image is then segmented into text and background based ...
关键词	自然场景文本检测视频文本检测局部梯度相关函数文本提取文本分割 Scene Text Detection Text Detection In Video Gradient Local Correlation Function Text Extraction Text Segmentation
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6651
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	白博. 图像与视频中文本检测与提取方法研究[D]. 中国科学院自动化研究所. 中国科学院大学,2014.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20101801462802（3169KB）			暂不开放	CC BY-NC-SA