CASIA OpenIR  > 毕业生  > 博士学位论文
数字图像复杂背景中文本检测与抽取技术研究
李敏花
学位类型工学博士
导师王春恒
2009-05
学位授予单位中国科学院研究生院
学位授予地点中国科学院自动化研究所
学位专业模式识别与智能系统
关键词文字检测 文字抽取 图像复杂度分析 复杂背景 数学形态学 条件随机场(Crf) 上下文信息
摘要随着数码相机、数码摄像机、摄像头、超高速扫描仪等图像获取设备的广泛应用,以 数字图像和视频为主的多媒体信息正迅速成为信息交流与服务的主流。如何让计算机自动 理解并利用图像和视频等多媒体文档的内容,已经成为当前图像处理和多媒体领域研究的 一个热点。由于图像和视频中的文字直接承载了语义信息,从而使得这些文字成为理解图 像内容的重要线索。为了能够检测、抽取并识别出图像中的文字,本文针对复杂背景下文 本信息的检测和抽取问题展开研究,主要内容包括: 第一,针对不同复杂度的图像,提出一种基于图像复杂度分析的自适应混合边缘文 本检测方法。该方法首先对图像复杂度进行分析,即将图像复杂度分为低、中、高三个级 别;然后对不同复杂度的图像自适应地选择合适的边缘检测方法。其中,对低复杂度图 像,采用Sobel算子进行边缘检测;对中等复杂度图像,采用Sobel算子进行边缘检测后, 利用梯度的幅值和方向信息进行边缘连接,将断裂的边缘连接起来;对高等复杂度图像, 本文设计了一种基于多尺度多方向的能最大限度地去除噪声并能较完整地保留文本边缘的 形态学边缘检测方法。这种基于图像复杂度分析的文本检测方法结合了基于边缘、连通域 和纹理的方法,采取由粗到精多级检测验证的策略,提高了文字检测率。通过分别在场景 图像集和视频图像集上与其它单边缘检测方法的比较,表明了本文所提出的文本检测方法 的有效性。 第二,针对复杂背景中的文本抽取问题,本文提出一种基于条件随机场的文本抽取 方法。该方法将文字底层的颜色和纹理等特征信息以及空间上下文信息融合到一个条件随 机场模型中,通过状态特征函数和转移特征函数描述图像底层特征和空间上下文特征。其 中,对于图像底层特征,本文在考虑颜色特征的基础上,加入了Gabor纹理特征。本文比 较了不同颜色空间和不同特征对基于条件随机场的文本抽取方法性能的影响,验证了该方 法的文本抽取性能。 第三,为了解决在复杂背景下,只利用图像底层信息无法有效区分文字像素和背景像 素的问题,本文在标准条件随机场的基础上,提出了一种基于多层上下文信息的条件随机 场文本抽取方法。在该方法中,图像的颜色和纹理等特征作为图像的局部信息,标签域上 下文信息作为图像的一种全局信息,通过描述图像中整体的标签分布情况,可以修正由图 像底层信息引起的分类错误。通过分别在简单和复杂背景下与其它文本抽取方法的比较, 表明了本文所提出的基于多层上下文信息的条件随机场方法在文本抽取方面尤其在复杂背 景下的有效性。
其他摘要Nowadays the amount of digital images and videos increases explosively with the development of high technology. And these multimedia documents contain a great deal of information which is valuable for many applications,such as infor- mation retrieval, image classi¯cation, data mining and etc. However, it is still very di±cult for computers to understand the contents of images and videos. Text in- formation embedded in images and videos is highly related to current content, and consequently it can provide key clue for image and video understanding. There- fore, it is imperative and challenging to develop a system that can detect, extract and recognize text information from images with complex background e®ectively. Aiming at this goal, the following research work has been conducted. Firstly, to detect text from images with di®erent background, an adaptive text detection method based on image complexity analysis is proposed. Before text detection, this approach adopts an image complexity analysis step to classify image complexity into three categories: low complexity, middle complexity and high complexity. Then images with di®erent complexity adopt di®erent methods to extract edge features. For images with low complexity, the simple sobel oper- ator is adopted, then a ¯lter is convolved with the edge image to further remove noises and enhance text edges. And for image with middle complexity, after edge detection with sobel operator, an edge repair method is used to connect broken edges. And for images with high complexity, a morphology based edge detection operator is designed which can remove most of noises while keeping text edges. The proposed text detection method takes the coarse to ¯ne detection strategy which combines the edge-based method, connected component based method and the texture based method into a framework. Experimental results verify the per- formance of the proposed method. Secondly, to extract text from images with complex background, a text ex- traction approach based on conditional random ¯eld is put forward, which can integrate local image features and context information into one framework. Lo- cal information and context information is modeled by state feature function and transition feature function respectively. This paper compares the extraction per- formance in di®erent color space and also compares the performance of di®erent features. Experimental results demonstrate its performance. Thirdly, the standard form of conditional random ¯eld is e®ective in mod- eling local information. Using local information sometimes can be good enough to identify pixel class. However, it becomes insu±cient in some ambiguous cases. Therefore, a method combining multi-layer context information is proposed, in which local image information, local context information and contextual label information are integrated into a single conditional random ¯eld. Local image in- formation is modeled to predict the category within image sites; while contextual label information is modeled to determine the patterns within label ¯eld. This paper compares the proposed method with other text extraction method. Com- paring results demonstrate its superiority over other methods on text extraction especially in complex background.
语种中文
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/11971
专题毕业生_博士学位论文
作者单位中国科学院自动化研究所
推荐引用方式
GB/T 7714
李敏花. 数字图像复杂背景中文本检测与抽取技术研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2009.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
200618014628033李敏花.p(31540KB)学位论文 暂不开放CC BY-NC-SA请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[李敏花]的文章
百度学术
百度学术中相似的文章
[李敏花]的文章
必应学术
必应学术中相似的文章
[李敏花]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。