不规则场景文本的端到端检测与识别研究

CASIA OpenIR > 毕业生 > 硕士学位论文

	不规则场景文本的端到端检测与识别研究
	徐珊波
	2022-05-19
页数	100
学位类型	硕士
中文摘要	随着具备拍照功能的智能电子设备的广泛应用，大量的蕴含文本的自然场景图像被拍摄存储并用于信息传递。准确地提取场景图像中的高层文本信息能有效辅助图像内容理解，并在图像检索、智能交通、增强现实等领域发挥日渐显著的作用。与扫描文档图像相比，场景图像中的不规则文本具有更多样的形状变化，这对场景文本的端到端识别任务造成了挑战。本文围绕不规则场景文本的端到端识别问题展开研究，主要工作和创新点归纳如下：（1）针对通用不规则场景文本，本文提出了基于角点与字符辅助的文本端到端识别模型。该模型结合了基于坐标回归和实例分割的端到端识别方法的优势，以少量的计算代价学习了文本角点热力图和字符位置热力图。其中，文本角点热力图将用于矫正由回归法所预测得到的不准确的文本角点坐标，字符位置热力图则用于增强字符中心特征以辅助文本识别。在两个基准数据集上的检测和识别结果证明了该模型的有效性。（2）针对圆环文本难以刻画轮廓和难以进行特征矫正的问题，本文提出了基于圆弧对齐的环形文本端到端识别模型。该模型的检测模块负责定位圆弧边界的控制点（起点、中点和终点），并用这些点对文本边界进行描述。圆弧采样结构将圆弧形的文本特征对齐为规则矩形特征以送入识别模块，并由此实现了检测模块和识别模块的端到端训练。在本文所提出的英文硬币数据集上的实验证明，该模型保留了圆环文本的空间信息，在检测和识别指标中均取得当前最优结果。（3）针对圆环文本检测和识别中的不一致性问题，本文提出了自动校正的环形文本端到端识别模型。为充分利用识别分支对检测分支的梯度反向传播作用，本模型提出将原先的圆弧采样点替换为可微的圆弧采样点生成器，从而允许识别结果对检测结果进行自动校正。为了缓解识别分支的输入特征在训练与测试阶段的不一致性问题，本模型在训练时依相同概率选取真实文本坐标和预测文本坐标进行特征采样。实验表明，自动校正方法使本模型在检测和识别性能上获得明显提升，并在各指标上均远超其他先进方法。
英文摘要	With the wide application of intelligent electronic devices with photographing function, a large number of natural scene images containing text are photographed, stored and used for information transmission. Accurately extracting high-level text information from scene images can effectively assist image content understanding, and play an increasingly significant role in image retrieval, intelligent transportation, augmented reality and other fields. However, compared with the scanned document images, the irregular text in natural scene images has more shape diversity, which poses a challenge to the end-to-end scene text recognition task. This thesis studies the end-to-end recognition of irregular text in natural scenes. The main efforts and innovations of this thesis can be summarized as follows: (1) For general irregular scene text, an end-to-end text recognition model based on corner and character assistance is proposed in this thesis. Combining the advantages of the end-to-end method based on coordinate regression and instance segmentation, this model learns the text corner heatmap and character position heatmap at a small computational cost. The text corner heatmap is used to rectify the inaccurate text corner coordinates obtained by the regression-based detection branch. The character position heatmap is used to enhance the character center feature and assist text recognition. The detection and recognition results on two benchmarks datasets validate the effectiveness of this model. (2) For arched text, it is difficult to depict its contour and adjust the arched features into rectangular ones. This thesis proposes an arched text end-to-end recognition model based on arc-align. The detection module of this model is responsible for locating the control points of the text boundary, that is, the starting point, midpoint and end point. The arc-align structure transforms the arched text feature into rectangular feature which serves as the input of the recognition module, and thus the detection module and the recognition module are able to achieve end-to-end training. Experiments on the proposed English coin dataset show that this model maintains the spatial information of arched text and achieves the current optimal results in both detection and recognition metrics. (3) Aiming at the inconsistency between arched text detection and recognition, an end-to-end arched text recognition model with automatic correction is proposed in this thesis. In order to enable the recognition loss to be backpropagated to the detection branch, this model proposes to replace the original arc-align sampler with a differentiable feature sampler, so that the recognition result can automatically correct the detection result. In order to alleviate the input inconsistency of recognition branch between the training phase and testing phase, the ground-truth and the predicted text coordinates are equally selected for feature sampling. Experiments show that the automatic correction method of this model can improve both the detection and recognition metrics, and its overall performance far exceeds other state of the art methods.
关键词	不规则场景文本环形文本端到端检测与识别自动校正
学科领域	模式识别
学科门类	工学::控制科学与工程
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/48497
专题	毕业生_硕士学位论文
推荐引用方式 GB/T 7714	徐珊波. 不规则场景文本的端到端检测与识别研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2022.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
硕士学位论文-不规则场景文本的端到端检测（24120KB）	学位论文		限制开放	CC BY-NC-SA