联机手写文档的动态版面分析

CASIA OpenIR > 多模态人工智能系统全国重点实验室 > 模式分析与学习

	联机手写文档的动态版面分析
	杨宇婷
	2022-05-29
页数	87
学位类型	硕士
中文摘要	联机手写文档是一种重要的媒体数据类型，广泛应用于人机交互、在线教育与自动化办公等领域。联机手写文档的版面分析是指将笔画划分成不同的语义类别，如：文字、公式、表格、列表、流程图、草图等。对于支持自由手写的文档分析系统来说，版面分析是一项基本任务。先前的方法本质上是静态的，依赖于全局上下文建模，必须等待用户完成整个文档才能进行预测。然而，在实践中更人性化的方式是在用户书写的同时进行实时预测。因此，本文研究联机手写图文混合文档的动态版面分析，目的是可以在手写输入过程中对文档内容动态实时地进行分析，为动态识别提供基础。主要研究内容和成果如下：一、收集并发布了目前规模最大、语义类别最多、结构最复杂的联机手写文档数据集 CASIA-onDo，同时提供了语义级别和实例级别的标签，并在其上进行了一系列基准实验，以促进文档分析领域的研究工作。该数据集下载地址为 http://www.nlpr.ia.ac.cn/databases/CASIA-onDo/。二、提出了一种基于多特征图的联机手写文档动态笔画分类方法（DyGAT）。该方法基于图神经网络，图中节点对应笔画，边对应笔画之间的时空近邻关系，笔画分类问题建模为图中的节点分类问题，并且用神经网络自动学习笔画特征，构建了一个端到端的学习框架。 DyGAT 的核心思想是构建一个多特征图，利用图网络的消息传递机制设计出一个充分利用不完整上下文信息的算法。通过在多个领域的公开手写数据集上进行实验，本文验证了该方法在动态笔画分类问题上的有效性。三、提出了一种基于分块的流式 Transformer 模型来对联机手写文档中的笔画进行实时地分类。该方法基于 Transformer 的编码器，使用注意力机制对笔画序列进行建模。通过将序列分块，并设计有效的注意力掩码策略来限制上下文信息，方法使模型具备了只考虑有限上下文的建模能力。在联机手写文档数据集IAMonDo 和联机手写流程图数据集 FC 上进行实验，结果表明与之前的方法相比，该方法以更低的训练内存需求与更快的推理速度取得了相当的性能。
英文摘要	Online handwritten document is an important type of media data, used in humanmachine interface, education, office automation, and so on. Layout analysis for online handwritten documents aims to divide strokes into several semantic categories, such as text, formula, table, diagram and graph. It is an essential component for document analysis systems that support free writing. Previous methods rely on global context, so these methods are essentially static in that they have to wait for the user to finish the whole document before making prediction. However, in practice, the more user-friendly way is to make real-time prediction as the user is writing. Therefore, this thesis studies the dynamic layout analysis of online handwritten image-text mixed documents. The purpose is to analyze the document content dynamically in the process of handwriting input, so as to provide a basis for dynamic recognition. The main research contents and results are as follows: 1.We constructed and released the CASIA-onDo database, which is the largest online handwritten multi-contents document database so far, with various writing styles and complex structure. Semantic level and instance level annotations are provided. We also performed multiple semantic segmentation experiments on CASIA-onDo, and provided initial experimental results as a baseline. 2.We proposed Dynamic Graph ATtention network (DyGAT), a novel end-to-end framework including feature extraction, graph building, and stroke classification, to solve the dynamic stroke classification problem. The core of this method is to formalize a document/sketch into a multi-feature graph, in which nodes represent strokes, edges represent the relationships between strokes, and multiple nodes are applied to one stroke to control the information flow. The method we present is general and is applicable to online handwritten data of many types. We conduct experiments on popular public datasets to perform semantic segmentation, and experimental results show competitive performance. 3.We proposed a chunk-based streaming Transformer model for real-time document object classification of strokes in online handwritten documents. The method is based on Transformer's encoder, using attention mechanism to model stroke sequences. The input sequence is divided into chunks, and an effective attention mask strategy is applied to limit context information, so that the model has the modeling ability to only consider limited context. The method is experimented on multiple online handwritten document datasets, and the results show that it achieves comparable performance with lower training memory requirements and faster inference speed than previous methods.
关键词	联机手写文档版面分析动态笔画分类图神经网络 Transformer
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/48860
专题	多模态人工智能系统全国重点实验室_模式分析与学习
推荐引用方式 GB/T 7714	杨宇婷. 联机手写文档的动态版面分析[D]. 自动化所. 自动化所,2022.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
杨宇婷毕业论文《联机手写文档的动态版面分（3190KB）	学位论文		开放获取	CC BY-NC-SA