Knowledge Commons of Institute of Automation,CAS
Modal Contrastive Learning Based End-to-End Text Image Machine Translation | |
Ma, Cong1,2![]() ![]() ![]() ![]() ![]() ![]() | |
发表期刊 | IEEE/ACM Transactions on Audio, Speech, and Language Processing (IEEE/ACM TASLP)
![]() |
ISSN | 2329-9290 |
2023-10 | |
卷号 | 32期号:32页码:2153-2165 |
通讯作者 | Zong, Chengqing(cqzong@nlpr.ia.ac.cn) |
摘要 | Text image machine translation (TIMT) aims at di- rectly translating text in the source language embedded in images into the target language. Most existing systems follow the cascaded pipeline diagram from recognition to translation, which suffers from the problem of error propagation, parameter redundancy, and information reduction. The end-to-end model has the potential to alleviate these issues via bridging the recognition and translation models. However, the challenge is the data limitation and modality gap between text and image. In this paper, we propose a novel end-to-end model, namely Modal contrastive learning based End- to-end Text Image Machine Translation (METIMT), which allevi- ates these issues through end-to-end text image machine translation architecture and modal contrastive learning. Specifically, an image encoder is designed to encode images into the same feature space of corresponding text sentences, with the guidance of an intra-modal and inter-modal contrastive learning module. To further promote the research of text image machine translation, we have constructed one synthetic and two real-world datasets. Extensive experiments show that our lighter, faster model outperforms not only existing pipeline methods but also state-of-the-art end-to-end models on both synthetic and real-world evaluation sets. Our code and dataset will be released to the public. |
关键词 | Transformers Machine translation Decoding Semantics Pipelines Text recognition Task analysis Text image machine translation contrastive learning text image recognition machine translation |
DOI | 10.1109/TASLP.2023.3324540 |
关键词[WOS] | RECOGNITION |
收录类别 | SCI |
语种 | 英语 |
资助项目 | National Natural Science Foundation of China |
项目资助者 | National Natural Science Foundation of China |
WOS研究方向 | Acoustics ; Engineering |
WOS类目 | Acoustics ; Engineering, Electrical & Electronic |
WOS记录号 | WOS:001197778500003 |
出版者 | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC |
七大方向——子方向分类 | 自然语言处理 |
国重实验室规划方向分类 | 语音语言处理 |
是否有论文关联数据集需要存交 | 否 |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://ir.ia.ac.cn/handle/173211/57613 |
专题 | 多模态人工智能系统全国重点实验室_自然语言处理 |
通讯作者 | Zhang, Yaping |
作者单位 | 1.School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, P.R. China 2.State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation, Chinese Academy of Sciences, Beijing, China 3.Fanyu AI Laboratory, Zhongke Fanyu Technology Co., Ltd, Beijing 100190, P.R. China |
第一作者单位 | 中国科学院自动化研究所 |
通讯作者单位 | 中国科学院自动化研究所 |
推荐引用方式 GB/T 7714 | Ma, Cong,Han, Xu,Wu, Linghui,et al. Modal Contrastive Learning Based End-to-End Text Image Machine Translation[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing (IEEE/ACM TASLP),2023,32(32):2153-2165. |
APA | Ma, Cong.,Han, Xu.,Wu, Linghui.,Zhang, Yaping.,Zhao, Yang.,...&Zong, Chengqing.(2023).Modal Contrastive Learning Based End-to-End Text Image Machine Translation.IEEE/ACM Transactions on Audio, Speech, and Language Processing (IEEE/ACM TASLP),32(32),2153-2165. |
MLA | Ma, Cong,et al."Modal Contrastive Learning Based End-to-End Text Image Machine Translation".IEEE/ACM Transactions on Audio, Speech, and Language Processing (IEEE/ACM TASLP) 32.32(2023):2153-2165. |
条目包含的文件 | 下载所有文件 | |||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | ||
3-TASLP-Modal_Contra(6551KB) | 期刊论文 | 作者接受稿 | 开放获取 | CC BY-NC-SA | 浏览 下载 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论