Knowledge Commons of Institute of Automation,CAS
复杂文档图像版面分析 | |
李晓辉 | |
2021-06-01 | |
页数 | 156 |
学位类型 | 博士 |
中文摘要 | 文档图像版面分析的任务是将文档图像分割成不同类别的区域并分析其相互之间的逻辑关系。随着深度学习的发展,单字识别和字符串识别的精度日益提高,复杂文档图像的版面分析逐渐成为了制约文档分析系统性能的瓶颈。本文对文档图像版面分析的主要问题进行了深入的研究,主要研究成果如下: |
英文摘要 | Layout analysis is to segment document images into different types of regions and analyze their logical relationships with each other. With the development of deep learning, the accuracy of single character recognition and text string recognition is increasing, leaving the layout analysis of complex document images a bottleneck restricting the performance of document analysis systems. This thesis studies on the main problems of document image layout analysis, and have achieved the results as follows: A method of document image primitive classification based on graphical model is proposed. This method combines deep neural networks and probabilistic graphical models, particularly, by using convolutional neural networks (CNN) to extract features from primitive regions and conditional random fields (CRF) for contextual classification of primitives. Connected components (CC) are taken as primitives in our experiments. Due to the combination of the feature extraction ability of CNN and the structured prediction ability of CRF, the proposed method can largely improve the performance of document primitive region classification. A bottom-up document image region segmentation method based on graphical model is proposed. The method treats connected components of the image as the nodes of a graph, and uses CNN to extract features for the nodes and the edges between adjacent primitive regions. According to the classification results of the nodes and edges given by graph convolutional networks (GCN), primitive regions belonging to the same class and the same region are aggregated to obtain the complete region segmentation results. Compared with traditional page segmentation methods or deep learning methods based on general object detection frameworks, the proposed method shows advantages in higher region segmentation performance, robustness and versatility. A method of document image region segmentation based on instance segmentation is proposed. The proposed method transforms the category labels of image regions into label pyramids (LP) through distance transformation and multi-threshold binarization, and trains fully convolutional neural networks (FCN) in a multi-task learning manner. In testing, the multi-task outputs are averaged to obtain the probability score map on which the watershed segmentation is performed to obtain the page regions. This method can overcome the problem of erroneous merging of regions caused by touching or overlapping between adjacent regions, and yields superior segmentation performance for regions of arbitrary shapes and various types of document images. A GCN based method for two-dimensional structure parsing of document images is proposed. For two-dimensional structured documents such as formulas and tables, the proposed method first uses CNN to detect formula symbols or table cells (called primitives), and then uses GCN to classify the primitives and the relationship between adjacent primitives to get a complete structure recognition result. Compared with other table recognition or formula recognition methods, the proposed method not only yields comparable or better performance, but also has better versatility and interpretability. |
关键词 | 文档图像 版面分析 区域分类 区域分割 二维结构解析 |
语种 | 中文 |
文献类型 | 学位论文 |
条目标识符 | http://ir.ia.ac.cn/handle/173211/45031 |
专题 | 多模态人工智能系统全国重点实验室_模式分析与学习 |
通讯作者 | 李晓辉 |
推荐引用方式 GB/T 7714 | 李晓辉. 复杂文档图像版面分析[D]. 中国科学院自动化研究所. 中国科学院大学,2021. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | ||
复杂文档图像版面分析-李晓辉-20210(24760KB) | 学位论文 | 开放获取 | CC BY-NC-SA |
个性服务 |
推荐该条目 |
保存到收藏夹 |
查看访问统计 |
导出为Endnote文件 |
谷歌学术 |
谷歌学术中相似的文章 |
[李晓辉]的文章 |
百度学术 |
百度学术中相似的文章 |
[李晓辉]的文章 |
必应学术 |
必应学术中相似的文章 |
[李晓辉]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论