CASIA OpenIR  > 毕业生  > 硕士学位论文
基于Transformer的几何基元检测与分析
周威
2024-07-05
页数80
学位类型硕士
中文摘要

几何基元分析在流程图识别与分析等领域中具有重要的应用价值。然而,由
于基元种类复杂多样,基元参数表示和优化困难,几何基元分析一直是研究的难
点。几何基元检测与几何基元关系分析是几何基元分析的两个重要任务。几何基
元检测主要对几何基元的类别和位置进行识别,而几何基元关系分析则是对基
元之间的关系 (如连接关系) 进行识别。当前的一些几何基元检测方法由经典目
标检测方法扩展而来,这些经典目标检测方法使用矩形框表示物体。然而,使用
矩形框并不能精确表示几何基元,因此无法获取精确的几何基元参数。此外,一
些现有的目标间关系分析方法采用了复杂的模型结构,模型训练困难,并且需要
额外的复杂后处理手段。针对上述问题,本文对流程图中几何基元检测及几何基
元间的连接关系识别展开了相应的研究,本文的主要内容和贡献点如下:
(1) 提出了一种面向流程图解析的通用几何基元表示与检测方案,并构建了
一个面向规则流程图的几何基元数据集。针对矩形框无法精确表示几何基元的
问题,本文提出了一种基于多关键点序列的通用几何基元表示方法。该方法具备
更高的准确性,能更精确地描述各种类型几何基元的形状。在此基础上,本文提
出了一种基于多关键点的通用几何基元检测方案,该方案包含了两种基于多关
键点检测几何基元的方法。同时,针对基于外接矩形框计算的交并比无法准确
地反映基元间重合度的问题,本文提出了一种基于多关键点的交并比计算方式。
该交并比计算方式利用关键点序列的极坐标来计算几何基元间的位置偏移,从
而能更加真实地反映几何基元间的重合度。此外,针对当前规则流程图中几何基
元检测及基元关系分析任务数据集缺失的问题,本文构建了一个面向流程图的
几何基元数据集。该数据集包含了 8000 张机器生成的流程图图像,涵盖了 9 类
几何基元,并附带超过 24 万条标注信息,包括几何基元关键点位置和基元之间
关系。实验结果表明,本文提出的基于多关键点序列的通用几何基元检测方案能
够有效地提高几何基元检测的性能。
(2) 提出了一种基于邻接矩阵预测的基元关系分析方法。针对现有目标间
关系分析方法过于复杂且需要后处理的缺点,本文提出了一种基于邻接矩阵预
测的单阶段基元关系分析方法。该方法将基元关系分析建模成有向图的识别问
题,其中,图上节点代表几何基元,图上的有向边表示基元间的连接关系。因此,
基元间的关系分析问题转化为图的邻接矩阵预测问题。同时,基于任务解耦的思
想,本文提出了一种动态关系邻接矩阵预测损失,使得模型在训练前期更加关注
于几何基元的检测,而在训练后期更加关注几何基元关系分析。实验结果表明,
本文提出的方法能够有效地识别基元间的连接关系。
(3) 构建了一个流程图检测与重建系统。该系统可以运行在各大主流浏览
器中,具有跨平台、高兼容性和易于交互的优点。具体地,本文基于浏览器/服
务器框架构建了一个前后端分离的系统。用户界面在浏览器中展示,以便用户操
作。流程图的识别与重建任务由服务器端完成,从而减轻了系统对用户设备性能
的要求,提升了用户体验。系统的展示结果表明,本文所提方法在实际场景中具
有较高的应用价值。

英文摘要

Geometric primitive analysis has significant application value in fields such as flow chart recognition and analysis. However, it has always been a research challenge due to the complexity and diverse types of geometric primitives, as well as the difficulties in representing and optimizing primitive parameters.

Geometric primitive detection and analysis of geometric primitive relationships are two crucial tasks in geometric primitive analysis. Geometric primitive detection aims to identify the categories and positions of geometric primitives, while the analysis of geometric primitive relationships focuses on recognizing the relationships between primitives, such as connection relationships. Some of the current geometric primitive detection methods are extensions of classical object detection methods, which use rectangular boxes to represent objects. However, using rectangular boxes does not accurately represent geometric primitives, leading to imprecise geometric primitive parameters. Additionally, certain existing methods for analyzing relationships between objects employ complex model structures, making model training challenging and requiring additional complex post-processing methods.

To address these issues, this thesis conducts corresponding research on the detection of geometric primitives and the identification of connection relationships between them in flow charts. The main contents and contributions of this thesis are as follows:

(1) A general geometric primitive representation and detection scheme for flowchart analysis is proposed, along with the construction of a geometric primitive dataset specifically designed for regular flowcharts. To address the issue of inaccurate representation of geometric primitives using rectangular boxes, this thesis presents a general geometric primitive representation method based on multiple keypoint sequences. This method allows for more precise description of various types of geometric primitives' shapes. Furthermore, an effective detection scheme based on multiple keypoints is proposed on top of this representation method.To overcome the limitations of using bounding box-based intersection-over-union (IoU) calculation, which fails to accurately reflect the overlapping degree between primitives, a keypoint-based IoU calculation method is introduced. This approach utilizes the polar coordinates of keypoint sequences to calculate the positional displacement between geometric primitives, resulting in a more realistic representation of their overlapping degree.In addition, addressing the lack of datasets for geometric primitive detection and relationship analysis in current regular flowcharts, a geometric primitive dataset specifically tailored for flowcharts is constructed. This dataset comprises 8,000 machine-generated flowchart images, covering nine categories of geometric primitives, and is accompanied by over 240,000 annotations, including keypoint positions and relationships between primitives.Experimental results demonstrate that the proposed universal geometric primitive detection scheme based on multiple keypoint sequences effectively enhances the performance of geometric primitive detection.

(2) A method for geometric primitive relationship analysis based on adjacency matrix prediction is proposed. Addressing the drawbacks of existing methods for analyzing relationships between objects, which are overly complex and require post-processing, this thesis presents a one-stage geometric primitive relationship analysis method based on adjacency matrix prediction. In this method, the analysis of geometric primitive relationships is modeled as a directed graph recognition problem, where nodes in the graph represent geometric primitives and directed edges represent the connection relationships between primitives. Therefore, the problem of analyzing relationships between primitives is transformed into a problem of predicting the adjacency matrix of the graph. Additionally, based on the idea of task decoupling, this thesis introduces a dynamic relationship adjacency matrix prediction loss, which allows the model to focus more on geometric primitive detection in the early stages of training and shift its attention to geometric primitive relationship analysis in the later stages. Experimental results demonstrate that the proposed method effectively identifies the connection relationships between primitives.

(3) A flowchart detection and reconstruction system has been constructed. This system can run on major mainstream browsers, offering advantages such as cross-platform compatibility, high compatibility, and user-friendly interaction. Specifically, this thesis builds a front-end and back-end separated system based on a browser/server framework. The user interface is presented in the browser to facilitate user operations. The tasks of flowchart recognition and reconstruction are handled on the server-side, thereby reducing the performance requirements on user devices and enhancing the user experience. The presentation results of the system demonstrate that the proposed method has significant practical value in real-world scenarios.

关键词基元检测 关系分析 关键点 Transformer
语种中文
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/58574
专题毕业生_硕士学位论文
推荐引用方式
GB/T 7714
周威. 基于Transformer的几何基元检测与分析[D],2024.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
周威-学位论文-最终版.pdf(10295KB)学位论文 限制开放CC BY-NC-SA
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[周威]的文章
百度学术
百度学术中相似的文章
[周威]的文章
必应学术
必应学术中相似的文章
[周威]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。