CASIA OpenIR  > 毕业生  > 硕士学位论文
视频文字检测与抽取技术研究
Alternative TitleResearch on Video Text Detection and Extraction Algorithm
李心洁
Subtype工学硕士
Thesis Advisor王春恒
2010-05-29
Degree Grantor中国科学院研究生院
Place of Conferral中国科学院自动化研究所
Degree Discipline模式识别与智能系统
Keyword文字检测 文字抽取 连通域分析 稀疏表达 边缘检测 Text Detection Text Extraction Connected Component Analysis Sparse Representation Edge Detection
Abstract随着数字技术的迅速发展,越来越多的数据库中除了包括文本信息外还包括图像和视频信息。如何让计算机自动对图像和视频信息进行理解和分析,引起了国内外众多学者的关注。图像和视频中的文字直接承载了语义特征,为描述图像和视频内容提供了十分丰富的语义信息,因此视频中的文字信息是视频理解和分析的一个非常重要特征,本文针对视频文字检测和提取的问题展开研究,主要内容包括: 1、实现了一种基于边缘密度特征和金字塔策略的文字检测方法。首先将图像转换为边缘图像,然后通过快速计算获得边缘密度特征,并过滤掉大部分非文字区域,接着利用连通域分析、垂直投影和水平投影,获得候选文本行。为了能够检测出不同大小的文字,采用金字塔分解策略,将图像分解为不同的尺度,并在不同的尺度下进行检测,最后将不同尺度的结果进行融合,得到最终文本行。实验分析表明,该方法不受图像文字大小、光照等影响,具有简单快速的优点。 2、提出了一种基于稀疏表达的文字检测方法。本方法采用由粗到细的检测框架,通过边缘密度特征进行快速粗检测,获得候选文本行,然后利用基于稀疏表达的分类方法对候选文本行进行分类,去掉误判的文本行。实验结果表明该方法具有较高的准确率和召回率。 3、视频图像的背景通常比较复杂,利用传统的二值化方法并不能有效地将文字提取出来。因此我们利用基于连通域分析和直方图的方法将文字图像转化为光学字符识别(OCR)软件可以识别的黑底白字或者白底黑字的图片。实验结果表明,此方法在文字颜色比较均匀并且文字和背景颜色有一定反差时取得了较好的效果。
Other AbstractWith the rapid advances in digital technology, more and more databases are multimedia in nature, containing images and video in addition to the textual information. Understanding the content of the images and video automatically by computer attracts more and more attention from international and national researchers. Text is an attractive feature for video annotation and indexing because it provides rich semantic information about the video. Therefore, it is an urgent and challenging task to develop a frame-work which can detect, extract and recognize texts from complex backgrounds effectively. Aiming at this goal, the following research work has been conducted. 1、 This thesis presents a text detection method based on edge intensity and pyramid strategies. Firstly, edge-map is acquired from the original image, which is used to filter out non-text region based on edge intensity feature. Then, connected-components analysis, vertical projection and horizontal projection are applied to get the candidate text lines. In order to detect different size of text, pyramid decomposition is utilized. Finally, candidate text lines of multi-scales are fused together to get the final result. Experiment results show that this method is efficient and not influenced by size of text and illumination etc. 2、 This thesis presents another text detection method via sparse representation. This method utilizes coarse to fine text detection framework. In coarse detection stage, quick edge intensity filter and connected-components analysis are utilized to get the candidate text lines. In fine detection stage, the candidate text lines are verified with non-complete dictionary generated by sparse presentation classification. The preliminary experiments show that this method has better result compared to traditional methods. 3、 It's difficult to extract text from video image with conventional binarization method because of its complex background. Thus we utilized connected-components analysis and histogram to convert the video image to picture of white text with black background or black text with white background which can easily be recognized by OCR software. The preliminary experiments show that this method has good performance when there is obvious contrast between the color of text and the color of background.
shelfnumXWLW1533
Other Identifier200728014628033
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/7526
Collection毕业生_硕士学位论文
Recommended Citation
GB/T 7714
李心洁. 视频文字检测与抽取技术研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2010.
Files in This Item:
File Name/Size DocType Version Access License
CASIA_20072801462803(3354KB) 暂不开放CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[李心洁]'s Articles
Baidu academic
Similar articles in Baidu academic
[李心洁]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[李心洁]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.