CASIA OpenIR  > 毕业生  > 硕士学位论文
基于深度神经网络的字幕行识别方法研究
翟传磊
Subtype工学硕士
Thesis Advisor徐波
2017-05
Degree Grantor中国科学院研究生院
Place of Conferral北京
Keyword深度神经网络 图像文本行识别 链接时序分类 注意力机制
Abstract文字作为人类语义信息的直接表达,在日常生活中有着重要的作用。图像中的文字通常可以直接反映出内容信息,其识别受到越来越多研究者的关注。但是图像文本中复杂的图像背景以及较低的分辨率等问题给文本的识别带来了很大的挑战。近年来深度神经网络发展迅猛,并在很多领域取得了成功。本文以图像文本行的识别作为研究任务,通过端到端的神经网络模型来直接对整行图像文本进行识别。主要的研究成果如下:
提出了一种基于清晰度准则的图像文本行生成方法。深度神经网络(Deep Neural Network,DNN)模型的训练需要大量带标签的数据,但是对图像文本逐幅进行标注的成本太高。本文采用基于字符清晰度的准则保证图像文本中每个位置的字符满足清晰度要求,进而对生成的整条文本行图像进行衡量。通过生成大量的满足清晰度要求的图像文本行样本,满足了模型训练对带标签数据的要求。
首次将链接时序分类(Connectionist Temporal Classification,CTC)的目标函数引入到中文图像文本行的识别任务中。这种方法借助递归神经网络(Recurrent Neural Network,RNN)对整行图像文本进行建模并结合CTC的目标函数完成模型的训练,从而避免了对图像文本行进行显式切分的工作,同时也提高了字符识别的准确率。
首次在中文图像文本的识别任务中应用带有注意力(Attention)机制的编解码(Encoder-Decoder)模型。这种模型在进行文本行识别的同时,对输入和输出之间的对齐关系可以显式地学习。结合卷积神经网络(Convolutional Neural Network,CNN)提取输入图像的特征序列并送入编解码模型,从而完成模型的整体训练,真正意义上实现了端到端的图像文本行识别。
Other Abstract
As a direct expression of human semantic information, text plays an important role in daily life. The text in the image can directly reflect its content information, so more and more researchers pay attention to the recognition of text in the image. However, the complex image background and the low resolution of text image bring great challenges to its recognition. In recent years, Deep neural network has developed rapidly and has been successfully applied to many areas. In this paper, the recognition of image text is focused on. And with the aid of end-to-end neural network models the whole text line in the image is recognized. The main achievements of dissertation are as follows:
Presenting a scheme based on the definition criterion to generate text images. The training of Deep Neural Network (DNN) based models needs a huge amount of labeled data. However, labeling text image one by one is costly. Considering the definition of each character position in the image text, the quality of the whole image text is measured. By means of generating a large number of text images, the training of models to recognize image text turns to be realizable.
First introducing the Connectionist Temporal Classification (CTC) objective function to the area of image text recognition. Combining CTC and the Recurrent Neural Network (RNN) to model the whole image text line relieves us of cutting the image text explicitly and achieves better recognition accuracy simultaneously.
Applying the Encoder-Decoder Model with attention mechanism to the task of image text recognition for the first time. Not only can the model recognize the text in the image, but also can learn the alignments between the input and output sequences. The model can be trained with the image feature sequence extracted by Convolutional Neural Network (CNN). Thus truly implementing end-to-end image text recognition.
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/14848
Collection毕业生_硕士学位论文
Affiliation中国科学院自动化研究所
First Author AffilicationInstitute of Automation, Chinese Academy of Sciences
Recommended Citation
GB/T 7714
翟传磊. 基于深度神经网络的字幕行识别方法研究[D]. 北京. 中国科学院研究生院,2017.
Files in This Item:
File Name/Size DocType Version Access License
基于深度神经网络的字幕行识别方法研究_翟(6784KB)学位论文 暂不开放CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[翟传磊]'s Articles
Baidu academic
Similar articles in Baidu academic
[翟传磊]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[翟传磊]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.