Modal Contrastive Learning Based End-to-End Text Image Machine Translation
Ma, Cong1,2; Han, Xu1,2; Wu, Linghui1,2; Zhang, Yaping1,2; Zhao, Yang1,2; Zhou, Yu2,3; Zong, Chengqing1,2
发表期刊IEEE/ACM Transactions on Audio, Speech, and Language Processing (IEEE/ACM TASLP)
ISSN2329-9290
2023-10
卷号32期号:32页码:2153-2165
通讯作者Zong, Chengqing(cqzong@nlpr.ia.ac.cn)
摘要

Text image machine translation (TIMT) aims at di- rectly translating text in the source language embedded in images into the target language. Most existing systems follow the cascaded pipeline diagram from recognition to translation, which suffers from the problem of error propagation, parameter redundancy, and information reduction. The end-to-end model has the potential to alleviate these issues via bridging the recognition and translation models. However, the challenge is the data limitation and modality gap between text and image. In this paper, we propose a novel end-to-end model, namely Modal contrastive learning based End- to-end Text Image Machine Translation (METIMT), which allevi- ates these issues through end-to-end text image machine translation architecture and modal contrastive learning. Specifically, an image encoder is designed to encode images into the same feature space of corresponding text sentences, with the guidance of an intra-modal and inter-modal contrastive learning module. To further promote the research of text image machine translation, we have constructed one synthetic and two real-world datasets. Extensive experiments show that our lighter, faster model outperforms not only existing pipeline methods but also state-of-the-art end-to-end models on both synthetic and real-world evaluation sets. Our code and dataset will be released to the public.

关键词Transformers Machine translation Decoding Semantics Pipelines Text recognition Task analysis Text image machine translation contrastive learning text image recognition machine translation
DOI10.1109/TASLP.2023.3324540
关键词[WOS]RECOGNITION
收录类别SCI
语种英语
资助项目National Natural Science Foundation of China
项目资助者National Natural Science Foundation of China
WOS研究方向Acoustics ; Engineering
WOS类目Acoustics ; Engineering, Electrical & Electronic
WOS记录号WOS:001197778500003
出版者IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
七大方向——子方向分类自然语言处理
国重实验室规划方向分类语音语言处理
是否有论文关联数据集需要存交
引用统计
文献类型期刊论文
条目标识符http://ir.ia.ac.cn/handle/173211/57613
专题多模态人工智能系统全国重点实验室_自然语言处理
通讯作者Zhang, Yaping
作者单位1.School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, P.R. China
2.State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation, Chinese Academy of Sciences, Beijing, China
3.Fanyu AI Laboratory, Zhongke Fanyu Technology Co., Ltd, Beijing 100190, P.R. China
第一作者单位中国科学院自动化研究所
通讯作者单位中国科学院自动化研究所
推荐引用方式
GB/T 7714
Ma, Cong,Han, Xu,Wu, Linghui,et al. Modal Contrastive Learning Based End-to-End Text Image Machine Translation[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing (IEEE/ACM TASLP),2023,32(32):2153-2165.
APA Ma, Cong.,Han, Xu.,Wu, Linghui.,Zhang, Yaping.,Zhao, Yang.,...&Zong, Chengqing.(2023).Modal Contrastive Learning Based End-to-End Text Image Machine Translation.IEEE/ACM Transactions on Audio, Speech, and Language Processing (IEEE/ACM TASLP),32(32),2153-2165.
MLA Ma, Cong,et al."Modal Contrastive Learning Based End-to-End Text Image Machine Translation".IEEE/ACM Transactions on Audio, Speech, and Language Processing (IEEE/ACM TASLP) 32.32(2023):2153-2165.
条目包含的文件 下载所有文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
3-TASLP-Modal_Contra(6551KB)期刊论文作者接受稿开放获取CC BY-NC-SA浏览 下载
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Ma, Cong]的文章
[Han, Xu]的文章
[Wu, Linghui]的文章
百度学术
百度学术中相似的文章
[Ma, Cong]的文章
[Han, Xu]的文章
[Wu, Linghui]的文章
必应学术
必应学术中相似的文章
[Ma, Cong]的文章
[Han, Xu]的文章
[Wu, Linghui]的文章
相关权益政策
暂无数据
收藏/分享
文件名: 3-TASLP-Modal_Contrastive_Learning_Based_End-to-End_Text_Image_Machine_Translation.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。