复杂文档图像版面分析

CASIA OpenIR > 多模态人工智能系统全国重点实验室 > 模式分析与学习

	复杂文档图像版面分析
	李晓辉
	2021-06-01
页数	156
学位类型	博士
中文摘要	文档图像版面分析的任务是将文档图像分割成不同类别的区域并分析其相互之间的逻辑关系。随着深度学习的发展，单字识别和字符串识别的精度日益提高，复杂文档图像的版面分析逐渐成为了制约文档分析系统性能的瓶颈。本文对文档图像版面分析的主要问题进行了深入的研究，主要研究成果如下：提出一种基于图模型的文档图像基元分类方法。本方法结合深度神经网络和概率图模型，利用卷积神经网络对基元区域提取特征，并用条件随机场对基元进行上下文分类。我们在实验中以图像连通成分作为基元。由于结合了卷积神经网络的特征提取能力和条件随机场的结构化预测能力，本方法能够在较大程度上提升文档基元区域分类的性能。提出一种基于图模型的自下而上的文档图像区域分割方法。本方法以图像连通成分作为基元，利用深度卷积神经网络提取基元区域以及相邻基元区域之间边的特征，并结合图卷积神经网络对节点和边进行分类，然后依据节点和边的分类结果将同一类别并属于同一区域的基元区域聚合从而得到完整的区域分割结果。与传统版面分割方法或者基于通用目标检测框架的深度学习方法相比，本方法能得到更好的区域分割性能，并具有更强的鲁棒性和通用性。提出一种基于实例分割的文档图像区域分割方法。本方法将图像区域类别标签通过距离变换和多阈值操作变换为标签金字塔，用多任务学习方式训练全卷积神经网络。在测试时，将多任务的输出取平均得到概率得分图，并在其上进行分水岭分割得到目标区域。该方法能够克服由于相邻区域之间粘连或重叠而造成的区域错误合并问题，同时对任意形状的区域和多种类型的文档都有较好的分割效果。提出一种基于图卷积神经网络的文档图像二维结构解析方法。针对公式和表格等二维结构文档，本方法首先利用深度卷积神经网络检测公式符号或者表格单元格（称为基元），然后利用图卷积神经网络对基元的类别以及相邻基元之间的关系进行分类，得到完整的结构识别结果。与其他表格识别或者公式识别方法相比，本方法不仅能取得相当或更好的识别性能，同时具有更好的通用性和可解释性。
英文摘要	Layout analysis is to segment document images into different types of regions and analyze their logical relationships with each other. With the development of deep learning, the accuracy of single character recognition and text string recognition is increasing, leaving the layout analysis of complex document images a bottleneck restricting the performance of document analysis systems. This thesis studies on the main problems of document image layout analysis, and have achieved the results as follows: A method of document image primitive classification based on graphical model is proposed. This method combines deep neural networks and probabilistic graphical models, particularly, by using convolutional neural networks (CNN) to extract features from primitive regions and conditional random fields (CRF) for contextual classification of primitives. Connected components (CC) are taken as primitives in our experiments. Due to the combination of the feature extraction ability of CNN and the structured prediction ability of CRF, the proposed method can largely improve the performance of document primitive region classification. A bottom-up document image region segmentation method based on graphical model is proposed. The method treats connected components of the image as the nodes of a graph, and uses CNN to extract features for the nodes and the edges between adjacent primitive regions. According to the classification results of the nodes and edges given by graph convolutional networks (GCN), primitive regions belonging to the same class and the same region are aggregated to obtain the complete region segmentation results. Compared with traditional page segmentation methods or deep learning methods based on general object detection frameworks, the proposed method shows advantages in higher region segmentation performance, robustness and versatility. A method of document image region segmentation based on instance segmentation is proposed. The proposed method transforms the category labels of image regions into label pyramids (LP) through distance transformation and multi-threshold binarization, and trains fully convolutional neural networks (FCN) in a multi-task learning manner. In testing, the multi-task outputs are averaged to obtain the probability score map on which the watershed segmentation is performed to obtain the page regions. This method can overcome the problem of erroneous merging of regions caused by touching or overlapping between adjacent regions, and yields superior segmentation performance for regions of arbitrary shapes and various types of document images. A GCN based method for two-dimensional structure parsing of document images is proposed. For two-dimensional structured documents such as formulas and tables, the proposed method first uses CNN to detect formula symbols or table cells (called primitives), and then uses GCN to classify the primitives and the relationship between adjacent primitives to get a complete structure recognition result. Compared with other table recognition or formula recognition methods, the proposed method not only yields comparable or better performance, but also has better versatility and interpretability.
关键词	文档图像版面分析区域分类区域分割二维结构解析
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/45031
专题	多模态人工智能系统全国重点实验室_模式分析与学习
通讯作者	李晓辉
推荐引用方式 GB/T 7714	李晓辉. 复杂文档图像版面分析[D]. 中国科学院自动化研究所. 中国科学院大学,2021.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
复杂文档图像版面分析-李晓辉-20210（24760KB）	学位论文		开放获取	CC BY-NC-SA