基于滑动窗分类的字符串识别方法研究
高立崑
2021-06
页数90
学位类型硕士
中文摘要

随着基于深度学习的文档图像识别方法的兴起,字符串识别精度不断提高,
并得到广泛应用。但基于端到端深度学习的字符串识别方法仍然存在一些不足:
不能准确定位字符、输出置信度不可靠等。因此,本文研究基于滑动窗和连接时
序分类(CTC)的字符串识别方法,取得了改进模型训练的收敛性和识别精度的
效果。
论文的主要工作和创新点归纳如下:
1. 提出了一种基于伪标签分布的改进 CTC 方法。
通过对 CTC 算法进行理论分析发现,该方法可以解释为期望最大(EM)算
法在序列识别中的应用。算法利用神经网络对每一帧的预测,通过前向后向算法
估计每一帧对应的伪标签分布,而后利用所估计的分布值指导网络收敛。基于该
解释,提出了一种改进的 CTC 方法,其中包含两个改进策略:基于伪标签稀疏
化的正则化策略,基于投票的解码算法。手写数字串识别和手写英文文本行识别
的实验表明,本方法可提升 CTC 字符串识别方法的训练收敛性和识别精度。
2. 提出了一种基于卷积原型分类器的字符串识别方法
卷积原型分类器具有字符识别精度高和输出置信度可靠的特点。我们把卷
积原型分类器用于字符串识别中的滑动窗分类,在模型的端到端训练中加入了
字符位置估计步骤,并对更加对齐的帧着重训练,从而提升模型的识别与对齐效
果。手写数字串、手写英文和中文文本行识别的实验结果表明,本方法可取得有
竞争力的识别性能。

英文摘要

With the development of deep learning, character string recognition methods have
gainedcontinuousimprovementofaccuracy. However,end-to-end-deep-learning-based
string recognition methods are insufficient in locating characters and outputting reliable
confidence. Therefore, this thesis study character string recognition methods based on
sliding-window and connectionist temporal classification (CTC), and have achieved a
better convergence and improved recognition accuracy of the model.
The main contributions of the thesis are summarized as follows:
1. An improved CTC method based on pseudo-label distribution is proposed.
Our theoretical analysis of the CTC algorithm found that this method can be ex-
plained as the Expected Maximum (EM) algorithm in sequence recognition. Using the
model prediction of each frame, CTC estimates the pseudo-label distribution through
the forward-backward algorithm and trains the model with cross-entropy loss. Based
on this explanation, an improved CTC method is proposed, which contains two im-
proved strategies: a regularization strategy based on pseudo-label distribution and a
voting-based decoding algorithm. Experiments on handwritten digit string recognition
and handwritten English text line recognition show that our methods can improve the
convergence and recognition accuracy of CTC string recognition method.
2. A string recognition method based on convolutional prototype classifier is pro-
posed.
The convolution prototype classifier has yielded high character recognition accu-
racy and reliable output confidence. Therefore we use the convolutional prototype clas-
sifier for sliding window classification in string recognition. In the end-to-end training
of model, a character position estimation step is added to improve the alignment effect
by concentrating on more accurately aligned frames. Experimental results on handwrit-
ten digit strings, handwritten English, and Chinese text lines show that this method can
achieve competitive recognition performance compared to state-of-the-art methods.

关键词字符串识别,连接时序分类算法,期望最大算法,卷积原型分类器
语种中文
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/45028
专题多模态人工智能系统全国重点实验室_模式分析与学习
推荐引用方式
GB/T 7714
高立崑. 基于滑动窗分类的字符串识别方法研究[D]. 中国科学院自动化研究所智能化大厦三楼第五会议室. 中国科学院自动化研究所,2021.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
ucasthesis__glk.pdf(6736KB)学位论文 开放获取CC BY-NC-SA
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[高立崑]的文章
百度学术
百度学术中相似的文章
[高立崑]的文章
必应学术
必应学术中相似的文章
[高立崑]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。