CASIA OpenIR  > 模式识别国家重点实验室
Thesis Advisor刘成林
Degree Grantor中国科学院自动化研究所
Place of Conferral中国科学院自动化研究所
Degree Name工学硕士
Degree Discipline计算机应用技术

文本识别(从图像中识别文字并转换为数字代码)具有广泛的应用需求。近 年来,随着深度学习的兴起和发展,文本识别算法在创新性、实用性和效率等方 面都有明显的提升。但是,这些识别算法大多是针对高质量的文本图片。在实际 应用中,光照不均匀,相机焦距差异、拍摄设备抖动等问题都会造成不同程度的 图像失真和模糊。这些低质图像会造成识别的精度损失,无法满足实际应用的需 求。因此,本文研究低质图像文本识别方法,主要利用超分辨率算法对低质文本 图像进行恢复,从而改善识别器的性能。主要工作内容分为以下两部分:

1. 面向文本识别对多种超分辨率算法进行了评价和改进。首先,在低质场 景文本图像数据集 TextZoom 中比较了 10 种前沿超分辨率算法的性能,并使用 三种识别算法(ASTER、MORAN、CRNN)来测试生成图像的识别精度。在此基 础上引入了空间变换网络和梯度剖面损失来提升各个超分辨率算法的生成效果。 其次,本文提出了一种低质文本图像生成优化算法。该算法基于识别器的反传梯 度指导生成器进行学习,从而改善识别效果,通过固定识别器参数以及引入识别 损失,进一步提升了识别器精度,有效地缓解了低质图像文本识别困难的问题。

2. 提出了一个基于超分辨率和生成对抗网络的文本识别框架——SRR-GAN。 该框架对传统的级联方案(图像超分和文本识别分步进行)进行了改进,在对抗 学习的框架下,将文本识别任务和超分任务集成起来。通过对识别模型和超分辨 率模型联合训练,该框架可以使神经网络在不同分辨率图片中学习到更通用的 特征,进而对不同分辨率图像都能保持较高的识别精度。

Other Abstract

Text carries rich and accurate semantic information, which is very important in many visual application scenarios. Therefore, text recognition has always been an ac- tive research topic in the field of computer vision and pattern recognition. In recent years, with the development of deep learning, numerous text recognition algorithms have shown novelty, practicality and efficiency. However, these algorithms mainly fo- cus on high quality text images. Text images can be distorted in many application sce- narios. Nonuniform illumination, camera defocus, motion blur from camera shake and low resolution can lead to low quality text images. The low quality text image will de- grade the recognition accuracy. This thesis studies recognition methods for low quality text image, by using super-resolution algorithms for image restoration, so as to improve the recognition performance. The main contributions lie in the following two aspects:

1. Several super-resolution algorithms for text recognition are evaluated and im- proved. Firstly, we compare 10 classical super-resolution algorithms on TextZoom, a low-quality scene text image dataset, and three recognition algorithms (ASTER, MORAN, CRNN) are used to test the restored image. We further use the Spatial Transformer Net- work (STN) and gradient profile loss to improve the restored image quality of super- resolution algorithm. Also, we propose to improve the recognition result based on the gradient of recognition loss. By fixing the parameters of the recognizer and introducing the recognition loss, the recognition accuracy can be improved further. The effective- ness of these techniques are verified in experiments.

2. A novel framework named SRR-GAN (Super-Resolution based Recognition with Generative Adversarial Networks), which is based on super-resolution and ad- versarial learning, is proposed. The proposed framework improves existing methods which adopt the cascade scheme by integrating text recognition with super-resolution via adversarial learning. Through the joint training of recognition and super-resolution models, we can learn more general features of images with various quality, so as to yield higher recognition performance for both high-resolution and low-resolution images.

Document Type学位论文
Recommended Citation
GB/T 7714
许铭潮. 低质图像文本识别方法研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2020.
Files in This Item:
File Name/Size DocType Version Access License
许铭潮毕业论文V9.pdf(4602KB)学位论文 开放获取CC BY-NC-SAView
Related Services
Recommend this item
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[许铭潮]'s Articles
Baidu academic
Similar articles in Baidu academic
[许铭潮]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[许铭潮]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: 许铭潮毕业论文V9.pdf
Format: Adobe PDF
All comments (0)
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.