基于深度结构化学习的手写数学公式识别

CASIA OpenIR > 多模态人工智能系统全国重点实验室 > 模式分析与学习

	基于深度结构化学习的手写数学公式识别
	吴金文
	2021-12
页数	124
学位类型	博士
中文摘要	手写数学公式的识别，对于教育、科学传播和自动化等领域都有着重要意义。相比于一般的文字识别或者图像识别问题，手写数学公式版面复杂，内容多样。因此，手写数学公式的符号检测、符号分类以及结构关系推理等都非常具有挑战性。本文研究手写数学公式的识别以及结构解析问题，利用深度学习和结构化学习的思想提出了几种有效的模型和方法，在手写数学公式识别实验中取得了优良的性能。论文的主要创新工作如下： 1. 提出了一种基于配对对抗学习的手写数学公式识别方法。该方法用注意机力解码器神经网络从输入公式数据的视觉表示中解码出LaTeX 形式符号串表达，训练中以标准的印刷体数学公式作为模板，在训练模型关注手写体公式和印刷体模板上相同的字符的同时，采用对抗机制使得深度神经网络关注字符的语义不变特征，以增强模型对书写风格变化的鲁棒性。在手写公式识别的公开数据集上的实验中，该方法取得了有竞争力的结果。 2. 提出了一种基于预感知单元的手写数学公式识别方法。基于注意机制的隐式分割模型处理形似的符号或者复杂的结构时，常常对某一符号过注意或者欠注意，导致在识别过程中重复识别或者丢失符号。为了解决这一问题，该方法设计了一种基于预感知单元的解码器，将符号阅读过程的空间信息嵌入在注意机制中，使得识别器能够准确地并行学习每一个符号的视觉和语义对应关系。实验表明，该方法能有效提升手写数学公式识别的精度。 3. 提出了一种基于图到图生成的手写数学公式识别和结构解析方法。该方法将输入手写数学公式数据和输出结构化表示均表示为图结构，在端到端识别联机手写数学公式的同时，探索手写数学公式的层次结构建模和解析。实验结果表明，该方法显著地刷新了多个公开数据集上的识别精度，并且可显式地分割出联机手写数学公式中的数学符号。同时，该方法在脱机数学公式上也展现出良好的拓展性。 4. 提出了一种基于字符原型的弱监督图到图联机手写数学公式识别方法。为了克服图到图生成模型训练过程中对大规模字符级标注数据的依赖，该方法先在封闭的字符集上学习数学符号的原型，结合字符原型进行符号的分割与分类，再利用数学公式的公式级弱标记学习公式符号间的层次结构和上下文信息。实验结果表明，该方法在弱监督的条件下，在多个公开数据集上都取得了有竞争力的结果。
英文摘要	The recognition of handwritten mathematical expressions is important to the fields of education, science and office automation. Compared with other vision recognition task, such as text recognition and image classification, handwritten mathematical expressions have more complex layout and divergent writing styles. Hence, the symbol detection and recognition, and structure analysis of handwritten mathematical expressions pose great challenges. This thesis studies the recognition and structure analysis problems of handwritten mathematical expressions. Taking advantage of deep learning and structured learning, this work proposes some effective methods for handwritten mathematical expression recognition (HMER) and has achieved superior performance on public dataset. The main contributions are as follows: 1. A paired adversarial learning based HMER method is proposed. This method parses the visual representation of input formula data into LaTeX format markup with an attention based neural decoder. During training, the method uses standard printed mathematical formulas images as templates, and guides the model to pay attention to the corresponding symbols on the handwritten formulas and printed templates, so as to learn the semantic-invariant feature of math symbols by adversarial learning to enhance the robustness to the writing style variation. Experimental results show that the method the robustness to the writing style variation. Experimental results show that the method performs competitively on public datasets. 2. A HMER method based on pre-aware unit is proposed. This is to overcome the problem that for similar symbols or complex structures, attention based implicit symbol segmentation model tends to over- or under-attend some symbols, so that some symbols are replicated or lost. The proposed method designs a decoder based on pre-aware unit, which embeds the spatial information of read symbols into the attention mechanism, so that the recognizer can accurately learn the visual and semantic correspondence of each symbol in parallel. Experimental results show that the method can improve the HMER accuracy significantly. 3. A graph-to-graph generation based method is proposed for HMER and structure analysis. In this method, both the input handwritten expression data and the output markup are formulated as graphs. The model explores the hierarchical structure, enabling symbol segmentation and relation parsing, and can be learned end-to-end. Experimental results show that this method significantly refreshes the recognition accuracy on several public datasets, and explicitly segment the mathematical symbols in online handwritten mathematical formulas. The method also shows extensibility to offline mathematical formulas. 4. A weakly-supervised symbol prototype based graph-to-graph learning method is proposed for online HMER. To overcome the reliance of graph-to-graph generation model learning on large dataset with symbol-level annotations, the proposed method first learns the symbols prototypes on a closed math symbol set, and then learns symbol segmentation and structures on formula-level labeled data. Experimental results show that the method achieves competitive results on multiple public datasets.
关键词	手写数学公式识别配对对抗学习预感知单元图到图生成字符原型
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/47472
专题	多模态人工智能系统全国重点实验室_模式分析与学习
通讯作者	吴金文
推荐引用方式 GB/T 7714	吴金文. 基于深度结构化学习的手写数学公式识别[D]. 中国科学院自动化所. 中国科学院大学,2021.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
132839414497723750.p（4312KB）	学位论文		开放获取	CC BY-NC-SA