GraphMLLM: A Graph-based Multi-level Layout Language-independent Model for Document Understanding
He-Sen Dai1,2; Xiao-Hui Li2; Fei Yin2; Xudong Yan3; Shuqi Mei3; Cheng-Lin Liu1,2
2024-09
Conference NameInternational Conference on Document Analysis and Recognition
Conference Date2024-09
Conference Place希腊雅典
PublisherSpringer
Abstract

Self-supervised multi-modal document pre-training for document knowledge learning shows superiority in various downstream tasks. However, due to the diversity of document languages and structures, there is still room to better model various document layouts while efficiently utilizing the pre-trained language models. To this goal, this paper proposes a Graph-based Multi-level Layout Language-independent Model (GraphMLLM) which uses dual-stream structure to explore textual and layout information separately and cooperatively. Specifically, GraphMLLM consists of a text stream which uses off-the-shelf pre-trained language model to explore textual semantics and a layout stream which uses multi-level graph neural network (GNN) to model hierarchical page layouts. Through the cooperation of the text stream and layout stream, GraphMLLM can model multi-level page layouts more comprehensively and improve the performance of language-independent document pretrained model. Experimental results show that compared with previous state-of-the-art methods, GraphMLLM yields higher performance on downstream visual information extraction (VIE) tasks after pre-training on less documents. Code and model will be available at https://github.com/HSDai/GraphMLLM.

KeywordVisual information extraction Self-supervised pre-training Multi-level page layouts
Language英语
IS Representative Paper
Sub direction classification文字识别与文档分析
planning direction of the national heavy laboratory人工智能基础前沿理论
Paper associated data
Document Type会议论文
Identifierhttp://ir.ia.ac.cn/handle/173211/57238
Collection多模态人工智能系统全国重点实验室_模式分析与学习
Corresponding AuthorXiao-Hui Li
Affiliation1.School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100049, China
2.State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation of Chinese Academy of Sciences, Beijing, 100190, China
3.T Lab, Tencent Map, Tencent Technology (Beijing) Co., Ltd., Beijing, 100193, China
Recommended Citation
GB/T 7714
He-Sen Dai,Xiao-Hui Li,Fei Yin,et al. GraphMLLM: A Graph-based Multi-level Layout Language-independent Model for Document Understanding[C]:Springer,2024.
Files in This Item:
File Name/Size DocType Version Access License
0059.pdf(967KB)会议论文 开放获取CC BY-NC-SAView
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[He-Sen Dai]'s Articles
[Xiao-Hui Li]'s Articles
[Fei Yin]'s Articles
Baidu academic
Similar articles in Baidu academic
[He-Sen Dai]'s Articles
[Xiao-Hui Li]'s Articles
[Fei Yin]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[He-Sen Dai]'s Articles
[Xiao-Hui Li]'s Articles
[Fei Yin]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: 0059.pdf
Format: Adobe PDF
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.