Multi-teacher Knowledge Distillation for End-to-End Text Image Machine Translation

CASIA OpenIR > 多模态人工智能系统全国重点实验室 > 自然语言处理

	Multi-teacher Knowledge Distillation for End-to-End Text Image Machine Translation
	Ma, Cong1,2 ; Zhang, Yaping1,2 ; Tu, Mei 4; Zhao, Yang1,2 ; Zhou, Yu2,3 ; Zong, Chengqing1,2
	2023-08
会议名称	The 17th Document Analysis and Recognition (ICDAR 2023)
会议录名称	Proceedings of the 17th Document Analysis and Recognition (ICDAR 2023)
会议日期	August 21-26, 2023
会议地点	San José, California, USA
摘要	Text image machine translation (TIMT) has been widely used in various real-world applications, which translates source language texts in images into another target language sentence. Existing methods on TIMT are mainly divided into two categories: the recognition-then- translation pipeline model and the end-to-end model. However, how to transfer knowledge from the pipeline model into the end-to-end model remains an unsolved problem. In this paper, we propose a novel Multi- Teacher Knowledge Distillation (MTKD) method to effectively distillate knowledge into the end-to-end TIMT model from the pipeline model. Specifically, three teachers are utilized to improve the performance of the end-to-end TIMT model. The image encoder in the end-to-end TIMT model is optimized with the knowledge distillation guidance from the recognition teacher encoder, while the sequential encoder and decoder are improved by transferring knowledge from the translation sequen- tial and decoder teacher models. Furthermore, both token and sentence- level knowledge distillations are incorporated to better boost the transla- tion performance. Extensive experimental results show that our proposed MTKD effectively improves the text image translation performance and outperforms existing end-to-end and pipeline models with fewer param- eters and less decoding time, illustrating that MTKD can take advan- tage of both pipeline and end-to-end models.
收录类别	EI
七大方向——子方向分类	自然语言处理
国重实验室规划方向分类	语音语言处理
是否有论文关联数据集需要存交	否
文献类型	会议论文
条目标识符	http://ir.ia.ac.cn/handle/173211/57620
专题	多模态人工智能系统全国重点实验室_自然语言处理
通讯作者	Zhang, Yaping
作者单位	1.School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, P.R. China 2.State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation, Chinese Academy of Sciences, Beijing, China 3.Fanyu AI Laboratory, Zhongke Fanyu Technology Co., Ltd, Beijing 100190, P.R. China 4.Samsung Research China - Beijing (SRC-B)
第一作者单位	中国科学院自动化研究所
通讯作者单位	中国科学院自动化研究所
推荐引用方式 GB/T 7714	Ma, Cong,Zhang, Yaping,Tu, Mei,et al. Multi-teacher Knowledge Distillation for End-to-End Text Image Machine Translation[C],2023.

条目包含的文件		下载所有文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
5-ICDAR2023-Springer（1478KB）	会议论文		开放获取	CC BY-NC-SA	浏览下载