自然场景图像中的文本检测与识别方法研究

CASIA OpenIR > 毕业生 > 博士学位论文

	自然场景图像中的文本检测与识别方法研究
其他题名	Scene Text Detection and Recognition in Natural Images
	高嵩
	2015-05-27
学位类型	工学博士
中文摘要	随着可拍照移动智能终端的大范围推广使用和互联网的迅猛发展，人们能够接触到的图像和视频呈现出快速的增长。如果计算机可以自动地理解图像和视频包含的高层次语义信息，就可以凭借其强大的计算能力和存储能力帮助人们更好地管理和使用这些海量的图像和视频。自然场景文字是图像高层语义的一种重要载体，近些年自然场景图像中的文本检测与识别技术越来越引起人们的重视。本文结合场景文字的特点，从特征表示的角度出发，对场景文本的检测与识别方法进行了系统的研究，主要工作包括如下内容： 1.由于场景文字训练样本和测试样本的特征分布差异性，一个场景文本检测器即使训练时已经引入了大量的训练样本，在针对特定图像进行文字检测时仍然无法保证性能。针对上述问题，本文提出了一种基于级联分类器迁移学习的自适应场景文本检测方法。该方法借鉴了迁移学习的思想，从特征的角度出发，认为特征的分类能力与场景类型密切相关，通过在线调节特征在分类判决中的权重自适应地检测特定场景中的文字。具体地讲，我们选择级联Adaboost作为场景文字检测器，提供尽可能多的特征供弱分类器选择，进而根据弱分类器对高置信度测试样本的分类能力，重新调整弱分类器对应的特征在Adaboost判断中的表决权重，从而达到自适应检测场景文字的目的。在国际公开数据集上的实验结果证明了迁移检测的有效性。 2.为了将场景文字的局部笔画和全局结构信息引入到特征表示中，本文提出了一种基于鉴别性笔画库的场景文本特征表示方法，使用多尺度笔画检测器的局部最大响应值作为特征，克服了以往笔画结构方法的笔画选择尺度单一和笔画鉴别性无法保证的问题。该方法通过对场景单字训练样本的关键点标注来收集笔画训练样本，使用训练好的笔画检测器在笔画正样本出现的位置进行滑动扫描得到局部最大响应值。这样做一方面减少了运算时间，另外一方面突出了笔画的位置特性，加强了特征表示的鉴别能力。特别地，本文通过线性支持向量机权重系数来去除笔画检测器之间的冗余，将最具有区分能力的笔画检测器有选择性地保留下来，在减轻计算负担的同时进一步提升了分类效果。公开数据集上的实验结果证明了本特征表示方法的优越性。 3.针对笔画结构方法关键点标注负担过重和未充分利用共生笔画高层语义信息的问题，我们在上一章工作的基础上提出了一种基于位置嵌入词典和共生笔画的场景文本识别方法。该方法首先在编码/聚集框架下给出了位置嵌入词典的概念，提出用词典的码本来代表特定的笔画结构，将码本与特定的图像编码区域关联。这样做既可以将场景文字的全局结构信息融入到特征表示中，还可以克服由于场景文字图像过小而无法使用图像金字塔方法引入位置信息的问题，另外还能有效地减少编码时间。紧接着，本文在位置嵌入词典编码向量上训练线性支持向量机，根据支持向量机权重系数选择鉴别性码本。最后，结合深度学习的思想，本文在第一层鉴别性位置嵌入词典编码向量的基础上学习了第二层共生笔画稀疏词典，将多笔画共生的高层语义信息引入到场景文字特征表示中，进一步提升了场景文字的分类性能。实验结果表明本特...
英文摘要	With the wide use of intelligent terminals which can take photos and the rapid development of internet, the number of available images and videos has seen a rapid growth. If computers can automatically understand the high-level semantic information contained in images and videos, then computers can help people managing and using the numerous images and videos efficiently and effectively. Scene text is one important carrier of image semantic information. In recent years, detection and recognition of scene image text is drawing more and more attention. This thesis takes scene text characteristics into full consideration, starts from the perspective of feature representation, conducts a thorough study on scene text detection and recognition, the main contributions are as follows: 1. Due to the feature distribution gap between scene text training and testing samples, a scene text detector can not guarantee its performance even already introducing a lot of training samples in the training stage. To deal with this issue, this thesis proposes a new scene text detection method based on transferring cascade classifier. This method refers to the idea of transfer learning and starts from the perspective of features. We think discriminative power of every feature is highly scene related and adjust feature classification weight online to detect scene text adaptively. Specifically, we choose cascade Adaboost as scene text detector, provide as many features as possible for weak classifiers and modify weak classifiers’ weights according to their abilities to classify high confidence testing samples. Then our method is able to detect scene text adaptively. Experiments on public datasets demonstrate the effectiveness of transfer detecting. 2. In order to introduce local strokes and global structure into feature representation of scene text, we propose a new feature representation method based on discriminative stroke bank and use maximal local response values of multi-scales stroke detectors as features. This feature representation method overcomes single scale and poor discrimination power of selected strokes of the past stroke-based methods. We collect stroke training samples based on keypoints labeling of scene character images and slide stroke detectors in the positive strokes’ positions to gain maximal local response values. Introducing response regions can reduce computation time, highlight strokes’ locality and strengthen the discrimination power of feature...
关键词	场景文本特征场景文本检测自适应场景文本识别鉴别性笔画库位置嵌入词典多笔画共生 Scene Text Feature Scene Text Detection Adaptivity Scene Text Recognition Discriminative Stroke Bank Spatiality Embedded Dictionary Co-occurrence Strokes
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6698
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	高嵩. 自然场景图像中的文本检测与识别方法研究[D]. 中国科学院自动化研究所. 中国科学院大学,2015.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20121801462803（25766KB）			暂不开放	CC BY-NC-SA