拍照图像文本识别方法与应用

CASIA OpenIR > 复杂系统管理与控制国家重点实验室 > 影像分析与机器视觉

	拍照图像文本识别方法与应用
	冯子朋
	2020-05
页数	88
学位类型	硕士
中文摘要	随着计算机视觉的发展和智能终端的快速普及，拍照图像文本识别技术已经得到日益广泛的应用。纸质发票作为财务系统的重要组成部分，长期以来其手工录入耗费了大量的人力物力。开发一套拍照票据识别系统，将有效降低票据录入成本。但与扫描图像相比，拍照图像常常受任意拍摄角度和纸张褶皱的影响，使图像中文本发生倾斜、透视、弯曲等畸变，增加了文本识别的难度。本文针对拍照图像中的不规则文本识别问题进行研究，主要工作和创新点归纳如下： (1)针对不规则文本行难以识别的问题，本文提出了基于中心线校正的文本识别模型。中心线校正模块基于空间变换网络，通过预测文本中心线上的采样点，自适应地将其映射为水平排列的文本。校正模块完全由序列识别模块反向传递的梯度进行训练，无需额外的监督信息。在多个数据集上的识别指标和可视化结果证明了该模块的有效性。 (2)针对序列识别模型损失图像空间信息的问题，本文提出了基于空间注意力的文本识别模型。设计了注意力时序网格和二维解码器，直接对二维的图像特征进行注意力加权和语义编解码。为保证注意力在时序上严格对齐，提出了先验机制对空间注意力进行监督。实验证明，该方法同时保持了不规则文本的空间信息和语义信息，在多个数据集上取得了当前最优的识别效果。 (3)针对不规则文本检测和识别两阶段衔接不充分的问题，本文提出了检测识别一体化的端到端文本识别模型。为充分利用检测分支生成的任意形状文本掩膜，提出了可将掩膜进行仿射变换的特征映射层。检测和识别分支通过特征映射层连接，可实现端到端协同训练。实验证明，该模型可精简识别流程，缩减模型尺寸，并有效提升端到端的识别性能。 (4)针对财务系统中大量票据需要人工录入的问题，本文以所提出的文本识别算法为核心，搭建了拍照票据识别系统，完成了包含票据采集、文本检测识别、结构化输出等环节的整套流程。开发了对应的客户端与服务端，保障了该系统的平稳高效运行。该系统的上线和成功推广证明了本文算法的有效性和实用性。
英文摘要	With the development of computer vision and the popularization of smartphones, the technology of text recognition for camera-captured images has become more and more widely used. Invoices are an important part of the financial system. For a long time, manual entry of invoices has consumed a lot of manpower and material resources. Developing a camera-captured invoice recognition system will greatly reduce the cost of invoice entry. However, compared with the scanned images, the camera-captured images are often affected by the shooting angle and paper's wrinkles, causing distortion in the texts such as tilt, perspective and curve, increasing the difficulty of text recognition algorithms. This paper studies the problem of irregular text recognition in camera-captured images. The main work and innovations are summarized as follows: (1) Aiming at the problem that irregularly arranged text lines are difficult to recognize, a text recognition model based on centerline rectification is proposed in this paper. The centerline rectification module is based on a spatial transformer network. By predicting the sampling points on the text centerline and mapping them into horizontally arranged text, the adaptive correction of the input image is completed. The rectification module is trained entirely by the gradient passed by the sequence recognition module, without additional supervision information. Recognition metrics and visual results on multiple datasets prove the validity of this module. (2) Aiming at the problem that the sequence recognition methods lose spatial information, this paper proposes a text recognition model based on spatial attention. An attention sequential grid and a two-dimensional decoder are designed to weight and decode the two-dimensional image features directly. In order to ensure that attention is strictly aligned, a priori mechanism is proposed to supervise spatial attention. Experiments show that this method simultaneously maintains the spatial and semantic information of irregular text lines, and has achieved the current optimal recognition results on multiple datasets. (3) Aiming at the problem of insufficient connection between two stages of irregular text detection and recognition, this paper proposes an end-to-end text recognition model. In order to make full use of the arbitrary shape text mask generated by the detection branch, a feature mapping layer that can affine transform the mask is proposed. The detection and recognition branches are connected through the feature mapping layer to realize the end-to-end cooperative training. Experiments show that this model can simplify the recognition process, reduce the size of the model, and effectively improve the end-to-end recognition performance. (4) Aiming at the problem that a large number of invoices need to be manually entered into the financial system, this paper builds a camera-captured invoice recognition system based on the proposed text recognition algorithm, which completes the entire process from invoice collection to text detection and recognition and then to generating structured output. Corresponding client and server are developed to ensure the smooth and efficient work of the system. The system's launch and successful generalization prove the effectiveness and practicability of the algorithm proposed in this paper.
关键词	拍照文本识别中心线校正空间注意力端到端识别票据识别系统
语种	中文
七大方向——子方向分类	文字识别与文档分析
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/39245
专题	复杂系统管理与控制国家重点实验室_影像分析与机器视觉
推荐引用方式 GB/T 7714	冯子朋. 拍照图像文本识别方法与应用[D]. 中科院自动化研究所. 中国科学院大学,2020.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
拍照图像文本识别方法与应用-冯子朋.pd（5459KB）	学位论文		开放获取	CC BY-NC-SA