CASIA OpenIR  > 模式识别国家重点实验室  > 模式分析与学习
基于滑动窗分类的字符串识别方法研究
高立崑
Subtype硕士
Thesis Advisor刘成林
2021-06
Degree Grantor中国科学院自动化研究所
Place of Conferral中国科学院自动化研究所智能化大厦三楼第五会议室
Degree Discipline模式识别与智能系统
Keyword字符串识别,连接时序分类算法,期望最大算法,卷积原型分类器
Abstract

随着基于深度学习的文档图像识别方法的兴起,字符串识别精度不断提高,
并得到广泛应用。但基于端到端深度学习的字符串识别方法仍然存在一些不足:
不能准确定位字符、输出置信度不可靠等。因此,本文研究基于滑动窗和连接时
序分类(CTC)的字符串识别方法,取得了改进模型训练的收敛性和识别精度的
效果。
论文的主要工作和创新点归纳如下:
1. 提出了一种基于伪标签分布的改进 CTC 方法。
通过对 CTC 算法进行理论分析发现,该方法可以解释为期望最大(EM)算
法在序列识别中的应用。算法利用神经网络对每一帧的预测,通过前向后向算法
估计每一帧对应的伪标签分布,而后利用所估计的分布值指导网络收敛。基于该
解释,提出了一种改进的 CTC 方法,其中包含两个改进策略:基于伪标签稀疏
化的正则化策略,基于投票的解码算法。手写数字串识别和手写英文文本行识别
的实验表明,本方法可提升 CTC 字符串识别方法的训练收敛性和识别精度。
2. 提出了一种基于卷积原型分类器的字符串识别方法
卷积原型分类器具有字符识别精度高和输出置信度可靠的特点。我们把卷
积原型分类器用于字符串识别中的滑动窗分类,在模型的端到端训练中加入了
字符位置估计步骤,并对更加对齐的帧着重训练,从而提升模型的识别与对齐效
果。手写数字串、手写英文和中文文本行识别的实验结果表明,本方法可取得有
竞争力的识别性能。

Other Abstract

With the development of deep learning, character string recognition methods have
gainedcontinuousimprovementofaccuracy. However,end-to-end-deep-learning-based
string recognition methods are insufficient in locating characters and outputting reliable
confidence. Therefore, this thesis study character string recognition methods based on
sliding-window and connectionist temporal classification (CTC), and have achieved a
better convergence and improved recognition accuracy of the model.
The main contributions of the thesis are summarized as follows:
1. An improved CTC method based on pseudo-label distribution is proposed.
Our theoretical analysis of the CTC algorithm found that this method can be ex-
plained as the Expected Maximum (EM) algorithm in sequence recognition. Using the
model prediction of each frame, CTC estimates the pseudo-label distribution through
the forward-backward algorithm and trains the model with cross-entropy loss. Based
on this explanation, an improved CTC method is proposed, which contains two im-
proved strategies: a regularization strategy based on pseudo-label distribution and a
voting-based decoding algorithm. Experiments on handwritten digit string recognition
and handwritten English text line recognition show that our methods can improve the
convergence and recognition accuracy of CTC string recognition method.
2. A string recognition method based on convolutional prototype classifier is pro-
posed.
The convolution prototype classifier has yielded high character recognition accu-
racy and reliable output confidence. Therefore we use the convolutional prototype clas-
sifier for sliding window classification in string recognition. In the end-to-end training
of model, a character position estimation step is added to improve the alignment effect
by concentrating on more accurately aligned frames. Experimental results on handwrit-
ten digit strings, handwritten English, and Chinese text lines show that this method can
achieve competitive recognition performance compared to state-of-the-art methods.

Pages90
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/45028
Collection模式识别国家重点实验室_模式分析与学习
Recommended Citation
GB/T 7714
高立崑. 基于滑动窗分类的字符串识别方法研究[D]. 中国科学院自动化研究所智能化大厦三楼第五会议室. 中国科学院自动化研究所,2021.
Files in This Item:
File Name/Size DocType Version Access License
ucasthesis__glk.pdf(6736KB)学位论文 开放获取CC BY-NC-SA
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[高立崑]'s Articles
Baidu academic
Similar articles in Baidu academic
[高立崑]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[高立崑]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.