拍照票据图像识别方法与系统
王淼
2019-05-31
页数65
学位类型硕士
中文摘要


随着计算机软、硬件技术的发展及智能手机的广泛应用,以数字图像和视频为载体的多媒体信息正迅速成为信息交流的主流方式之一。图像中的文字能表达高层语义信息,因此对图像中的文字进行自动检测和识别的需求与日俱增。随着手机等移动终端的广泛应用,拍照图像逐渐占据重要的地位。票据是生活中常见的一种文档图像,拍照票据图像的自动识别具有强大的优势,可以节省大量的人力资源。但票据种类繁多、版面复杂、关键信息不尽相同,票据纸张易弯曲变形、文字打印质量参差不齐,以及拍照造成的图像模糊、阴影、反光等问题,都使得拍照票据图像中的文字识别变得困难。

本文针对拍照票据图像中的文字检测与识别的问题展开了一系列的研究,本文的主要内容如下:

(1)设计和实现了一种基于局部梯度分布的拍照票据图像序列质量评估算法。该算法能够在多张连拍序列图像中选取质量最好的一张,并对该张图像的质量进行判断,对于拍照质量很差、没有识别意义的图像将不会进行后续识别。实验证明该方法能有效解决手机拍照取像的选择问题。

(2)针对票据种类繁多的问题,设计和实现了一种新票种注册和分类的方法。该方法采用CNN提取特征、GLVQ学习模板、KNN进行分类的思路,不仅能够对一些常用票据图像进行分类,而且仅需要少量新票种样本,即可快速支持新票种的分类。

(3)在通用目标检测中Focal Loss的基础上,提出了基于Focal Loss的票据文字检测方法,实验结果证明了该方法能够有效地检测任意方向的文字。

(4)提出了一种自适应的端到端文本行识别的模型。该方法通过加入可形变卷积,增加感受野的范围,使网络能够自适应地学习一种隐式分割的方法。实验证明该方法有效地提升了识别准确率。  

(5)产品化:针对财务、金融领域中大量新增或遗留的票据需要人工录入的问题,本文搭建了智能云财务共享服务平台,实现了拍照票据图像的结构化识别。本文所设计的算法是该系统的核心部分,该系统的上线和良好运行验证了本文算法的有效性和实用性。

英文摘要

With the development of computer software and hardware technology and the wide application of mobile phones, multimedia information based on digital images and videos is rapidly becoming one of the mainstream ways of information exchange. The text in the image can express high-level semantic information, so the need for automatic detection and recognition of the text in the image is increasing. With the wide application of mobile terminals such as mobile phones, camera-captured images are increasingly occupying an important position. The invoice is a common document in our life, and the automatic recognition of the camera-captured invoice image has a strong advantage, which can save a lot of manpower. However, the types of invoices are numerous and the layout is complicated. The key information of invoices is not the same and the invoice is easy to bend and deform. Photographing causes blurring, shadows, reflections.These problems make the text recognition for the camera-captured invoice image difficult.

 

This paper has carried out a series of researches on the problem of recognition for the camera-captured invoice image. The main contents of this paper are as follows:

 

(1) We design and implement an image quality evaluation algorithm for the camera-captured document image based on local gradient distribution. The method can select the best quality one among the multiple continuous shooting sequence images, and judge the quality of the image. The poor quality of the images will not be recognized later. The experiment proves that this method can effectively solve the problem of choosing the image captured by mobile phones.

 

(2) A method of registering and classifying new invoice images was designed and implemented for the variety of invoices. The method uses CNN to extract features, GLVQ to learn templates and KNN to classify. Not only can some common invoice images be classified, but also can quickly support the new invoice identification with only a little of new invoice samples.

 

(3) Based on Focal Loss in the general target detection, the method of invoice text detection with Focal Loss is proposed. The experimental results prove that the method can effectively detect text in any direction.

 

(4) We propose an adaptive end-to-end text line recognition model. The method adds deformable convolution to expand the range of receptive fields, enabling the network to adaptively learn a segmentation method.

 

(5) Productization:To solve the problem that the large number of new or legacy invoices in the financial fields need to be manually entered, we have built a smart cloud financial sharing service platform, to realize the structured identification of camera-captured invoice image by means of text detection and recognition technology. The algorithm designed in this paper is the core of the system, and the online application verifies the validity and practicability of the algorithm.

 

关键词图像质量评估 文字检测 文字识别 卷积神经网络
语种中文
七大方向——子方向分类文字识别与文档分析
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/23841
专题复杂系统管理与控制国家重点实验室_影像分析与机器视觉
推荐引用方式
GB/T 7714
王淼. 拍照票据图像识别方法与系统[D]. 中国科学院自动化研究所. 中国科学院大学,2019.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
拍照票据图像识别方法与系统.pdf(4089KB)学位论文 开放获取CC BY-NC-SA
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[王淼]的文章
百度学术
百度学术中相似的文章
[王淼]的文章
必应学术
必应学术中相似的文章
[王淼]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。