基于滑动窗分类的字符串识别方法研究

CASIA OpenIR > 多模态人工智能系统全国重点实验室 > 模式分析与学习

	基于滑动窗分类的字符串识别方法研究
	高立崑
	2021-06
页数	90
学位类型	硕士
中文摘要	随着基于深度学习的文档图像识别方法的兴起，字符串识别精度不断提高，并得到广泛应用。但基于端到端深度学习的字符串识别方法仍然存在一些不足：不能准确定位字符、输出置信度不可靠等。因此，本文研究基于滑动窗和连接时序分类（CTC）的字符串识别方法，取得了改进模型训练的收敛性和识别精度的效果。论文的主要工作和创新点归纳如下： 1. 提出了一种基于伪标签分布的改进 CTC 方法。通过对 CTC 算法进行理论分析发现，该方法可以解释为期望最大（EM）算法在序列识别中的应用。算法利用神经网络对每一帧的预测，通过前向后向算法估计每一帧对应的伪标签分布，而后利用所估计的分布值指导网络收敛。基于该解释，提出了一种改进的 CTC 方法，其中包含两个改进策略：基于伪标签稀疏化的正则化策略，基于投票的解码算法。手写数字串识别和手写英文文本行识别的实验表明，本方法可提升 CTC 字符串识别方法的训练收敛性和识别精度。 2. 提出了一种基于卷积原型分类器的字符串识别方法卷积原型分类器具有字符识别精度高和输出置信度可靠的特点。我们把卷积原型分类器用于字符串识别中的滑动窗分类，在模型的端到端训练中加入了字符位置估计步骤，并对更加对齐的帧着重训练，从而提升模型的识别与对齐效果。手写数字串、手写英文和中文文本行识别的实验结果表明，本方法可取得有竞争力的识别性能。
英文摘要	With the development of deep learning, character string recognition methods have gainedcontinuousimprovementofaccuracy. However,end-to-end-deep-learning-based string recognition methods are insufficient in locating characters and outputting reliable confidence. Therefore, this thesis study character string recognition methods based on sliding-window and connectionist temporal classification (CTC), and have achieved a better convergence and improved recognition accuracy of the model. The main contributions of the thesis are summarized as follows: 1. An improved CTC method based on pseudo-label distribution is proposed. Our theoretical analysis of the CTC algorithm found that this method can be ex- plained as the Expected Maximum (EM) algorithm in sequence recognition. Using the model prediction of each frame, CTC estimates the pseudo-label distribution through the forward-backward algorithm and trains the model with cross-entropy loss. Based on this explanation, an improved CTC method is proposed, which contains two im- proved strategies: a regularization strategy based on pseudo-label distribution and a voting-based decoding algorithm. Experiments on handwritten digit string recognition and handwritten English text line recognition show that our methods can improve the convergence and recognition accuracy of CTC string recognition method. 2. A string recognition method based on convolutional prototype classifier is pro- posed. The convolution prototype classifier has yielded high character recognition accu- racy and reliable output confidence. Therefore we use the convolutional prototype clas- sifier for sliding window classification in string recognition. In the end-to-end training of model, a character position estimation step is added to improve the alignment effect by concentrating on more accurately aligned frames. Experimental results on handwrit- ten digit strings, handwritten English, and Chinese text lines show that this method can achieve competitive recognition performance compared to state-of-the-art methods.
关键词	字符串识别，连接时序分类算法，期望最大算法，卷积原型分类器
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/45028
专题	多模态人工智能系统全国重点实验室_模式分析与学习
推荐引用方式 GB/T 7714	高立崑. 基于滑动窗分类的字符串识别方法研究[D]. 中国科学院自动化研究所智能化大厦三楼第五会议室. 中国科学院自动化研究所,2021.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
ucasthesis__glk.pdf（6736KB）	学位论文		开放获取	CC BY-NC-SA