Video OCR研究

CASIA OpenIR > 毕业生 > 博士学位论文

	Video OCR研究
其他题名	Video OCR Research
	王修飞
	2011-05-28
学位类型	工学博士
中文摘要	随着信息技术和互联网技术的高速发展，视频逐渐成为了人们获取和传递信息的一种重要媒介。视频中的文字是一种高级语义信息，能够为视频索引与检索提供十分重要的辅助信息。如果能将其视频中的文字准确定位并识别出来，识别结果可用于基于内容的视频存储、标记与检索。本文致力于Video OCR研究，其主要目的是提取出视频中的文本信息，具体包括：视频文本定位、视频文本跟踪、视频文本分割与识别。本文的主要贡献如下： 1）本文建立了四个用于视频文本识别研究的数据库：CASIA-TRAIN、CASIA-IMAGE、CASIA-TEXT和CASIA-VIDEO，并分别对其进行了标注，这四个数据库分别用于文本纹理分类器训练、视频文本定位、视频文本分割以及视频文本识别研究，本文同时给出了其对应的评测准则。 2）本文提出了一种视频文本背景复杂度的度量准则，并给出了其近似计算方法，基于该准则，本文提出了一种基于背景分类的文本定位方法，其主旨是对背景复杂程度不同的视频文本采用分而治之的策略，分别采用不同的定位方法。实验证明，本文所提出的方法对背景复杂度不同的视频文本均能取得不错的定位效果。 3）针对复杂背景，本文提出了一种基于分块策略的纹理特征，用于文本精确定位。首先将文本区域分为8*8块，然后分别对每个子块提取灰度对比度特征（GSC）和边缘方向直方图特（EOH）。其中，GSC特征主要是用于去除复杂背景的干扰，EOH特征则是用于描述文本的整体纹理特性。与其它特征的对比实验表明，本文所提出的特征具有较强的可分性，可以获得较为精确的文本位置。 4）在视频文本分割阶段，本文提出了一种基于笔画和颜色的文本分割方法。首先通过笔画算子提取出候选的文本区域；其次根据候选文本区域对视频文本的像素进行高斯建模，通过高斯模型对文本图像进行分割；最后，通过局部颜色一致性分析，对非文本噪声进行过滤。实验表明本文方法对非文本噪声有较强的鲁棒性。
英文摘要	With the development of information technology, videos have been becoming an important media for transmitting and obtaining information. Text in videos is a powerful source of high-level semantics, which can provide useful cues for video logging and indexing. If these text occurrences could be detected, segmented and recognized automatically, they could be used to content-based video search, automatic video logging, text-based video indexing and so on. The work of this paper is mainly focused on Video OCR research, which aims at extracting the text information in videos, including video text localization, tracking, segmentation and recognition. The main work of this paper is as follows: 1) Four datasets are established and labeled : CASIA-TRAIN, CASIA-IMAGE, CASIA-TEXT and CASIA-VIDEO, for the research of text texture classifier testing and training, text localization, text segmentation and video text recognition. 2) We introduce the Text-Noise-Ratio (TNR) as a measurement for the complexity of the text background, and give an approximate calculation of TNR when the text region is unknown. Based on this, a novel text localization method is proposed in this paper. The basic idea of this method is to handle the texts under different background by different methods. Experimental results show that our method performs well under different backgrounds. 3) We proposed a new texture feature for precise text detection.Firstly, the text region is partitioned to 64 blocks; then the GSC feature is extracted to eliminate the influence of the background, and the EOH feature to express the statistical feature of text in different blocks. Comparison experiments prove that the proposed feature is quite effective for precise text detection. 4) For text segmentation, we propose a new method based on stroke and color. The candidate text regions are firstly extracted by the stroke operator to get the model of text color distribution; secondly, a coarse segmentation is carried out by the color model; finally, the feature of text color consistency is used to eliminate the noises in the coarse segment. Experiment results show that our method is quite robust to non-text noises.
关键词	视频文本定位视频文本跟踪视频文本分割视频文本识别 Video Text Detection Video Text Tracking Video Text Segmentation Video Text Recognition
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6364
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	王修飞. Video OCR研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2011.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20081801462909（6712KB）			暂不开放	CC BY-NC-SA