英文摘要 |
With the development of computer vision and the popularization of smartphones, the technology of text recognition for camera-captured images has become more and more widely used. Invoices are an important part of the financial system. For a long time, manual entry of invoices has consumed a lot of manpower and material resources. Developing a camera-captured invoice recognition system will greatly reduce the cost of invoice entry.
However, compared with the scanned images, the camera-captured images are often affected by the shooting angle and paper's wrinkles, causing distortion in the texts such as tilt, perspective and curve, increasing the difficulty of text recognition algorithms. This paper studies the problem of irregular text recognition in camera-captured images. The main work and innovations are summarized as follows:
(1) Aiming at the problem that irregularly arranged text lines are difficult to recognize, a text recognition model based on centerline rectification is proposed in this paper. The centerline rectification module is based on a spatial transformer network. By predicting the sampling points on the text centerline and mapping them into horizontally arranged text, the adaptive correction of the input image is completed. The rectification module is trained entirely by the gradient passed by the sequence recognition module, without additional supervision information. Recognition metrics and visual results on multiple datasets prove the validity of this module.
(2) Aiming at the problem that the sequence recognition methods lose spatial information, this paper proposes a text recognition model based on spatial attention. An attention sequential grid and a two-dimensional decoder are designed to weight and decode the two-dimensional image features directly. In order to ensure that attention is strictly aligned, a priori mechanism is proposed to supervise spatial attention. Experiments show that this method simultaneously maintains the spatial and semantic information of irregular text lines, and has achieved the current optimal recognition results on multiple datasets.
(3) Aiming at the problem of insufficient connection between two stages of irregular text detection and recognition, this paper proposes an end-to-end text recognition model. In order to make full use of the arbitrary shape text mask generated by the detection branch, a feature mapping layer that can affine transform the mask is proposed. The detection and recognition branches are connected through the feature mapping layer to realize the end-to-end cooperative training. Experiments show that this model can simplify the recognition process, reduce the size of the model, and effectively improve the end-to-end recognition performance.
(4) Aiming at the problem that a large number of invoices need to be manually entered into the financial system, this paper builds a camera-captured invoice recognition system based on the proposed text recognition algorithm, which completes the entire process from invoice collection to text detection and recognition and then to generating structured output. Corresponding client and server are developed to ensure the smooth and efficient work of the system. The system's launch and successful generalization prove the effectiveness and practicability of the algorithm proposed in this paper. |
修改评论