CASIA OpenIR  > 毕业生  > 硕士学位论文
自然场景中文本检测技术的研究与应用
李根
2017-05-31
学位类型工学硕士
中文摘要

随着互联网技术的高速发展以及便携式数码设备的快速普及,形式丰富且表现力强的图像、视频等多媒体数据逐渐成为人们获取和表达信息的重要载体。这些多媒体数据包含了大量自然场景中的图像,其中蕴含丰富的语义信息,理解自然场景图像中的语义信息对于多媒体数据的分析有重要帮助。自然场景图像中的文本信息是理解和描述场景语义的关键线索。因此,关于场景图像中文本信息提取的研究应运而生。不同于扫描文档中的光学字符识别,我们将自然场景的文本信息提取分为三个步骤:文本检测、文本分割、文本识别。文本检测是整个过程的第一步,检测的准确与否直接决定整体的性能。自然场景的文本检测任务面临诸多难点,场景图像存在模糊、遮挡、光照不均等失真,同时图像中包含树叶、窗户、栅栏等类似文本的背景杂物,此外场景图像中的文本在语言种类、颜色、字体、排列方向等方面也存在丰富的变化,实现高效而准确的文本检测成为极具挑战性的工作。

本文深入研究了自然场景中的文本检测技术,基于模式识别、深度学习等相关领域知识,提出了两种文本检测的算法,有针对性的解决了文本检测任务中的难点问题。相比现有主流的方法,算法在性能和速度上均有所提升。同时,本文将提出的算法应用于法庭台牌识别问题,实现了场景文本信息提取原型程序。主要工作内容可概括如下:

1. 本文提出一种自底向上的级联过滤的文本检测算法,该算法通过多层次的文本级联过滤,解决了文本检测任务中模糊文本的漏召回问题以及文本类似物导致的虚警问题,提高了检测系统的精度。首先,算法选用单字召回率极高的极值区域算子在多通道图像下提取文字候选对象。为了减少虚警,算法结合几何先验信息和图像局部特征对文字候选对象进行过滤。随后,算法将文字连接成文本行,并提出文本熵的概念,结合图像的深度特征再次对文本行候选对象进行验证,不同层次的过滤最终保证了算法的精度和鲁棒性。实验表明,该算法在文档分析与识别国际会议的两个公开数据集上性能良好,相比现有主流算法,召回率、准确率均有所提升。

2. 本文结合深度学习在目标检测领域中的研究,提出了一种基于单字检测网的文本检测算法。该算法构建了一个用于检测单字对象的快速区域卷积神经网络,解决了多语种的文本检测问题。首先,该算法提出一种基于邻近区域融合的最大极值稳定区域提取方法,有效地解决了中文等语言中出现的连通域断裂问题。随后,算法将字符候选对象包围框映射到快速区域卷积神经网络的特征图中,提取出固定长度的深度特征,并将其用于判断(1)该候选对象是否是文字(2)该候选对象是否是文本行的端点。最后,利用启发式规则将单字连接成文本行。实验表明,单字检测网算法能够快速有效地检测多种场景的不同语言的文本。
3. 法庭台牌识别是法院庭审视频的标签管理应用中不可或缺的技术,本文将基于单字检测网的文本检测算法应用于法庭台牌识别系统之中,实现了场景文本信息提取原型程序。该程序通过迁移学习离线训练文本检测识别模型。学习完毕后,用户可通过命令行交互指定输入数据与超参数。随后,程序进行在线测试并将识别结果返回给用户。实验表明,该系统有良好的速度和精度。

英文摘要

With the rapid development of Internet and the popularization of portable digital devices, informative multimedia contents, such as images and videos, had gradually become an important carrier to obtain and express information. These multimedia data include lots of scene images, which contain rich semantic information, the analysis of multimedia contents will benefit from understanding these semantic information. Text information in scene images plays a key role in understanding and describing the semantic information of the scene. Therefore, the study of extracting text information in scene images is receiving increasing attention.
Different from optical character recognition(OCR) in scanned documents, we usually divide the text information extraction process into three steps: text detection, text segmentation, and text recognition. As the beginning step, the accuracy of text detection determines the performance of the entire system.
There exists blurs, occlusion, uneven illumination and other distortions in natural scene images, as well as background interferences such as leaves, window, and fences. Meanwhile, text inside the images differs in language, color, font, and orientation. Therefore, it's really a very challenging task to do fast and accurate text information extraction.

This paper delves into the text detection technique in the natural scene images. Based on the domain knowledge of pattern recognition and deep learning, we proposed two algorithms which are targeted to solve different problems in text detection tasks. Compared to the existing methods, our algorithms exceed others in both speed and accuracy. Meanwhile, the proposed algorithms are applied in place card recognition problem in court images, and we implement a scene text information extraction demo program. The main contributions are as follows:

1. This paper presents a Bottom-up Cascaded Filtering(BCF) text detection algorithm, by improving the detection of small blur text and text-alikes, it can increase the recall rate while achieving pretty high precision rate. At the beginning, Extremal Region(ER) operator is used to extract character candidates in multichannel images for its high character coverage rate. Then these character candidates are filtered sequentially based on the prior knowledge and local feature. After that, characters are connected into text-lines, and these text-lines are filtered based on proposed word entropy model and semantic feature. Different levels of filtering ensure the accuracy and robustness of the algorithm. Experiments show that the BCF algorithm obtains great performance on two public datasets(2013, 2015) of International Conference on Document Analysis and Recognition(ICDAR). Both recall and precision rate increases when compared to existing methods.

2. Learning from generate object detection, we propose another text detection algorithm based on Character Detection Network(CDN). This algorithm constructs a Fast Region Convolution Neural Network(Fast RCNN) to detect single characters, which can detect multilingual text. At first, it proposes a candidate extraction method base on nearest Maximal Stable Extremal Region(MSER) connecting, which can solve the connected component splitting problem of language such as Chinese. And then character candidate bounding boxes are mapped into the feature map of Fast RCNN. By using ROI Pooling, we obtain a fixed-length convolution feature. This feature is used to determine (1) Whether the candidate is a character or not (2)whether it is the side point of text-line or not. After that, these characters are connected into text-lines using heuristic algorithm. Experiments show that the CDN algorithm can detect multilingual text in different scenarios, while achieving good accuracy and robustness.

3. Place card recognition problem in court images is indispensable for a court trial video label management system, this paper applies the proposed CDN text detection algorithm in place card recognition system, and implements a scene text information extraction demo program. This demo program trained text detection and recognition models by transfer learning in off-line stage. After the model is well-learned, input data and super parameter can be set through command line. Then the program will conduct online testing and return the visual results to user. Experiment shows the great speed and accuracy of this system.

关键词文本检测 自然场景 快速区域卷积神经网络 最大极值稳定区域
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/14829
专题毕业生_硕士学位论文
作者单位中国科学院自动化研究所
第一作者单位中国科学院自动化研究所
推荐引用方式
GB/T 7714
李根. 自然场景中文本检测技术的研究与应用[D]. 北京. 中国科学院研究生院,2017.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
李根论文_自然场景文本检测与应用_评阅后(10627KB)学位论文 限制开放CC BY-NC-SA
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[李根]的文章
百度学术
百度学术中相似的文章
[李根]的文章
必应学术
必应学术中相似的文章
[李根]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。