CASIA OpenIR  > 毕业生  > 硕士学位论文
Thesis Advisor肖柏华
Degree Grantor中国科学院研究生院
Place of Conferral北京
Keyword格式文档 二值化 格式配准 内容提取 字符串识别
4、设计实现了一种基于长短时记忆(Long-Short Term Memory, LSTM)循环神经网络和卷积神经网络(Convolutional Neural Network, CNN)相结合的字符串识别技术。采用CNN提取文本图像特征序列,作为LSTM网络的输入,并采用Connectionist Temporal Classification(CTC)来解决模型训练中的数据标签对齐问题。针对深度神经网络学习需要大量训练样本的问题,本文还设计实现了一种简易的字符串文本图像样本生成算法。识别模型在出租车票的合成数据集上进行训练,在真实的出租车票数据上测试,取得了较好的识别结果。
Other AbstractA structured document is a document that has a fixed format to a certain extent,such as bills, certificate and bank statements. It is widely used in education, finance, logistics, taxation, administrative management, and other various industries. At present, the electronic transformation from the papery structured document mainly relies on manual input, and this will lead to great manpower and resource cost. Therefore, the automatic recognition of structured documents may bring great economic benefit and wide social value.  
The automatic recognition system of the structured document includes image preprocessing, format matching, the extraction of content to be recognized and the text recognition and so on. In this paper, we do some related work on these parts, and as follows:
1. A novel binarization method of document image based on the structural symmetry of strokes is proposed in this paper. The symmetric properties consist of two parts: the opposite gradient directions of stroke edges and the coexistence of the foreground and background pixels. The proposed method uses these properties to extract structural symmetry elements in the document image, and conduct the local binarization. At the same time, the proposed method can solve the problem of the uneven illumination and the pollution of the stain by background normalization. The proposed method achieves satisfactory results on several public datasets and a dataset collected in actual projects.
2. A method of pattern definition is proposed. The pattern definition refers to the description of the structured layout, which includes the structure and the logic of the layout. In this paper, we use Label-Value pair to represent the information in the structured document. And an artificial interaction tool is created to guide the user to complete the pattern definition. The proposed method and artificial interaction tool is used in the recognition system of bank documents and tickets.
3. A method of pattern registration and content extraction is proposed. We mainly address two problems, which include classifying the document image format and extracting the text regions to be identified in the document. According to the pattern definition, we build a multi-scale flexible framework, and use this frame to find the best match results combining with sliding window strategy. At last, we test on datasets of bank statement to verify validity and robustness of the proposed algorithm.
4. Design and implement a text recognition technology based on Long-Short Term Memory (LSTM) and convolution neural network (CNN). CNN is used to extract the characteristics of the text image as a sequence of input to the LSTM, and use Connectionist temporal classification technology to solve the problem of data label alignment. Aiming at the problem that deep learning needs a large number of training samples, this paper also designs a simple text image sample generation algorithm. Finally, the model is trained on the synthetic dataset of taxi tickets, and gets a good test result on the real taxi ticket dataset.
Subject Area计算机视觉与模式识别
Document Type学位论文
Recommended Citation
GB/T 7714
何坤. 格式文档图像配准与识别方法及应用[D]. 北京. 中国科学院研究生院,2017.
Files in This Item:
File Name/Size DocType Version Access License
格式文档图像配准与识别方法及应用.pdf(4703KB)学位论文 暂不开放CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[何坤]'s Articles
Baidu academic
Similar articles in Baidu academic
[何坤]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[何坤]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.