自然场景图像文本检测方法研究

CASIA OpenIR > 毕业生 > 硕士学位论文

	自然场景图像文本检测方法研究
	邱泉1,2
	2017-12
学位类型	工学硕士
中文摘要	日常生活中，人们无时不刻地接触大量自然场景。自然场景中不仅包含大量的图形信息，而且存在丰富的文本信息。与一般的视觉元素不同，文字包含了丰富的高层语义信息，能够帮助计算机更为准确地解读图像内容，图像文本检测对图像理解有着重要的意义。现在，市场研发比较前沿的如翻译软件、汽车自动驾驶、图像检索、人机交互、增强现实等，无不需要机器能够理解自然场景中的文本信息。因此，能够准确和高效地检测场景中的文本信息的算法成为市场迫切的需求，也是文档分析和识别领域的重要研究内容之一。目前对自然场景图像文本检测方法和技术主要有以下几个大的方向，基于区域的方法、基于连通域的方法、深度学习的方法。在基于连通域的方法中，最大稳定极值区域受到追捧和广泛应用。基于此方法，在本文中，我们提出了一个平面化的最大稳定极值区域方法，该方法能够在不需要训练的情况下，有效快速削减大量重复的最大稳定极值区域，以提高场景文本检测的速度和准确率。在ICDAR 2013鲁棒阅读数据集上，我们的方法能够削减70%冗余的最大稳定极值区域，并且相比传统的最大稳定极值区域，程序运行速度能提升接近一倍。和其他的方法作比较，我们的方法仅需要对文本和非文本连通区域训练分类器，所需的训练样本较少，不需要太长的训练时间。对最大稳定极值区域的削减极大地降低了计算复杂度，提升了运行效率。实验结果亦能达到当前最前沿的方法的性能，表明了该方法的有效性。
英文摘要	In daily life, people always touch a lot of natural scenes. The natural scene contains not only a large amount of graphic information, but also text information. Different from the general visual elements, the text contains rich high-level semantic information, which can help the computer to understand the image more accurately. So image text detection is very important for image understanding. Now, the market application, such as translation software, auto driving, image retrieval, human-computer interaction, augmented reality, etc., all need the machine to understand the text information in natural scenes. Therefore, the accurate and efficient detection of text information in the scene has become an urgent need of the market, and it is also one of the important research areas in the field of document analysis and recognition. At present, the main methods and techniques of text detection for natural scene images are Region-Based Method, Connected Component-Based Method and Deep Learning Method. Among the methods proposed so far, the maximally stable extremal region (MSER) method, as a connected component based one, has been pursued and applied widely. In this paper, we propose an efficient method, called flattening method, to quickly prune the large number of overlapping MSERs, so as to improve the speed and accuracy of MSER-based scene text detection. On the ICDAR 2013 Robust Reading Dataset, our method can reduce 70% redundant maximally stable extremal region, and compared with the traditional maximally stable extremal region method, the program can run nearly twice as fast. Compared with other methods, our method only needs training text/non-text connected component classifier, which requires less training samples and does not need too long training time. The reduction of the maximally stable extremal region greatly reduces the computational complexity and improves the computational efficiency. The experimental results can also reach the performance of the state of the art method.
关键词	自然场景文本检测自然场景文本提取最大稳定极值区域平面化
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/15616
专题	毕业生_硕士学位论文
作者单位	1.中国科学院自动化研究所 2.中国科学院大学
第一作者单位	中国科学院自动化研究所
推荐引用方式 GB/T 7714	邱泉. 自然场景图像文本检测方法研究[D]. 北京. 中国科学院大学,2017.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
学位论文.pdf（1989KB）	学位论文		限制开放	CC BY