CASIA OpenIR  > 毕业生  > 硕士学位论文
Thesis Advisor吴显礼
Degree Grantor中国科学院自动化研究所
Place of Conferral中国科学院自动化研究所
Degree Discipline模式识别与智能系统
Abstract版面分析是OCR系统的一个重要组成部分,它将文档图象按一定的特征分 割成各个部分,并判断各部分是文本、标题、图象、图形或表格等。我们把这 样的部分称为版面基元。版面分析得到的各个基元在后续处理中将采用不同的 处理方法,如文本基元将用文字识别器处理,表格基元将用专门的表格识别器 处理。 在本文中,我们将版面分析系统按照处理的过程分成了图象预处理、倾斜 校正、版面分割和版面理解等几个部分,对每个部分的一些基本思想和算法作 了详细的介绍,并在其中穿插介绍了我们在汉王HW_OCR的版面分析系统中 使用的各项技术。HW_OCR的版面分析系统处理的对象是单篇文章的中文文 档,主要技术包括腐蚀-膨胀的去噪、平滑的方法、最近邻的倾角检测方法、利 用分隔子的版面分割方法和基于排版规则的版面理解方法。 为了快速、有效地处理复杂的中文文档版面(文本横、竖混排,基元非矩 形假设),我们提出了利用分隔子的版面分割方法。该方法综合使用了自顶向下 和自底向上两种策略,利用文章排版的各种实分隔子和局部特征差异形成的虚 分隔子完成版面分割。该方法具有抗倾斜、适应性强和处理速度快等特点。 本文最后介绍基于上述方法的实用版面分析系统和实验结果,并对今后的 改进提出了一些有益的建议。
Other AbstractDocument analysis is an important part of OCR system, it segment the document image into several parts and distinguish the parts text, title, image, drawing or table et al. We call the parts blocks. The blocks got by document analysis will be different treated, for example, text blocks will be done by character recognition engine, table will be done by table recognition engine. In this thesis, document analysis system consists of several parts such as image pre-process, deskew, document segment and document understanding. Basic ideals and algorithms are introduced in detail and some techniques applied in the HW_OCR document analysis system are presented . The HW_OCR system mainly processes Chinese document with a single article, and its main methods include morphological filter, skew angle detection with nearest neighbour, document segment using seperator and document understanding based on type-set rules. In order to quickly and effectively deal with complex Chinese document(horizontally and vertically aligned document, non-rectangle document ), we propose the document segment method using seperator. The method integrates top-down and bottom-up strategies and segment document using all kinds of seperators. It has such characteristics as skew robustness, strong adaptability and quick speed. Finally, the experiment results are given and some good suggestions are provided in future research work.
Other Identifier529
Document Type学位论文
Recommended Citation
GB/T 7714
江世盛. 中文版面分析[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,1999.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[江世盛]'s Articles
Baidu academic
Similar articles in Baidu academic
[江世盛]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[江世盛]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.