基于深度神经网络的字幕行识别方法研究

CASIA OpenIR > 毕业生 > 硕士学位论文

	基于深度神经网络的字幕行识别方法研究
	翟传磊
	2017-05
学位类型	工学硕士
中文摘要	文字作为人类语义信息的直接表达，在日常生活中有着重要的作用。图像中的文字通常可以直接反映出内容信息，其识别受到越来越多研究者的关注。但是图像文本中复杂的图像背景以及较低的分辨率等问题给文本的识别带来了很大的挑战。近年来深度神经网络发展迅猛，并在很多领域取得了成功。本文以图像文本行的识别作为研究任务，通过端到端的神经网络模型来直接对整行图像文本进行识别。主要的研究成果如下：提出了一种基于清晰度准则的图像文本行生成方法。深度神经网络（Deep Neural Network，DNN）模型的训练需要大量带标签的数据，但是对图像文本逐幅进行标注的成本太高。本文采用基于字符清晰度的准则保证图像文本中每个位置的字符满足清晰度要求，进而对生成的整条文本行图像进行衡量。通过生成大量的满足清晰度要求的图像文本行样本，满足了模型训练对带标签数据的要求。首次将链接时序分类（Connectionist Temporal Classification，CTC）的目标函数引入到中文图像文本行的识别任务中。这种方法借助递归神经网络（Recurrent Neural Network，RNN）对整行图像文本进行建模并结合CTC的目标函数完成模型的训练，从而避免了对图像文本行进行显式切分的工作，同时也提高了字符识别的准确率。首次在中文图像文本的识别任务中应用带有注意力（Attention）机制的编解码（Encoder-Decoder）模型。这种模型在进行文本行识别的同时，对输入和输出之间的对齐关系可以显式地学习。结合卷积神经网络（Convolutional Neural Network，CNN）提取输入图像的特征序列并送入编解码模型，从而完成模型的整体训练，真正意义上实现了端到端的图像文本行识别。
英文摘要	As a direct expression of human semantic information, text plays an important role in daily life. The text in the image can directly reflect its content information, so more and more researchers pay attention to the recognition of text in the image. However, the complex image background and the low resolution of text image bring great challenges to its recognition. In recent years, Deep neural network has developed rapidly and has been successfully applied to many areas. In this paper, the recognition of image text is focused on. And with the aid of end-to-end neural network models the whole text line in the image is recognized. The main achievements of dissertation are as follows: Presenting a scheme based on the definition criterion to generate text images. The training of Deep Neural Network (DNN) based models needs a huge amount of labeled data. However, labeling text image one by one is costly. Considering the definition of each character position in the image text, the quality of the whole image text is measured. By means of generating a large number of text images, the training of models to recognize image text turns to be realizable. First introducing the Connectionist Temporal Classification (CTC) objective function to the area of image text recognition. Combining CTC and the Recurrent Neural Network (RNN) to model the whole image text line relieves us of cutting the image text explicitly and achieves better recognition accuracy simultaneously. Applying the Encoder-Decoder Model with attention mechanism to the task of image text recognition for the first time. Not only can the model recognize the text in the image, but also can learn the alignments between the input and output sequences. The model can be trained with the image feature sequence extracted by Convolutional Neural Network (CNN). Thus truly implementing end-to-end image text recognition.
关键词	深度神经网络图像文本行识别链接时序分类注意力机制
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/14848
专题	毕业生_硕士学位论文
作者单位	中国科学院自动化研究所
第一作者单位	中国科学院自动化研究所
推荐引用方式 GB/T 7714	翟传磊. 基于深度神经网络的字幕行识别方法研究[D]. 北京. 中国科学院研究生院,2017.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
基于深度神经网络的字幕行识别方法研究_翟（6784KB）	学位论文		限制开放	CC BY-NC-SA