融合置信度的图像文本翻译方法研究

CASIA OpenIR > 多模态人工智能系统全国重点实验室 > 自然语言处理

	融合置信度的图像文本翻译方法研究
	伍凌辉
	2022-05-19
页数	77
学位类型	硕士
中文摘要	图像文本翻译旨在将嵌在图像中的源端语言文本翻译成目标语言。目前主流的图像文本翻译系统通常由相互独立的光学字符识别 (Optical Character Recognition, OCR) 和机器翻译（Machine Translation, MT）模型级联组成。OCR 模型将图像文本识别成转录文本，MT 模型将转录文本翻译成目标语言。然而在级联模型中存在着广泛的误差累积的问题，即 OCR 输出的转录文本噪声会引起后续翻译的错误。然而无论是当前的级联式模型还是缓解噪声文本对翻译影响的鲁棒性神经机器翻译方法都没有考虑到 OCR 模型输出的置信度信息。因此本文以置信度为切入点，针对级联模型中的误差累积问题展开相关研究，主要研究内容可归纳如下两点： (1) 提出了一种基于置信度门控注意力的图像文本机器翻译方法目前为缓解噪声文本带来的问题，鲁棒性机器翻译主要采用以下两种方法： 1）使用合成噪声文本，以模拟 OCR 转录带来的噪声；2）利用干净文本和噪声文本的对比学习，拉近噪声文本和干净文本的分布。未能考虑以下问题：1）忽视来自 OCR 模型的置信度信息，未能考虑 OCR 和 MT 系统的有效融合；2）仅采用合成噪声，类型单一，无法覆盖实际噪声类型。针对以上的问题，本文提出了一种基于置信度门控注意力的图像文本机器翻译方法。不同于以往的级联模型方法，本文所提方法能将 OCR 字符识别的置信度融入到后续翻译框架中，能够缓解 OCR 识别错误对后续翻译的影响。此外本文还针对 OCR 转录文本噪声的特点，设计了相应监督文本来给模型提供子词粒度的对比损失。实验表明，所提出的方法能够显著提升级联式图像文本翻译模型的翻译性能。 (2) 提出了一种融合置信度和图像信息的图像文本机器翻译方法图像信息能够对置信度较低的单词提供额外信息，为此本文提出了融合置信度和图像信息的图像文本机器翻译方法。通过引入图像编码器来编码图像信息并通过置信度对不同的子词融合图像信息以补充由于引入置信度带来的损失。但由于图像信息中存在着背景，字体颜色等冗余信息，会对模型的泛化性能带来影响。因此本文通过在图像编码器端引入对比学习，使得相同的字符有着相同的表示，从而提升模型的泛化性能。实验表明，在引入图像信息后，所提方法能够进一步提升级联式图像文本翻译模型的翻译性能。
英文摘要	Image text machine translation aims to translate the source language embedded in images into the target language. The image text translation system is usually cascaded by optical character recognition (OCR) and machine translation (MT) models. The OCR model recognizes the image text into a transcribed text, and then the MT model translates the transcribed text into the target language. However, there is a widespread problem of error accumulation in the cascade model, that is, the transcribed text noise output by OCR will cause errors in subsequent translation. However, neither the current cascade model nor the robust neural machine translation method to mitigate the impact of noisy text on translation takes into account the confidence information output by OCR model. Therefore, this paper takes confidence as the starting point to carry out relevant research on the error accumulation in the cascade model. The main research contents can be summarized as follows: (1) propose an image text machine translation method based on confidence gated attention At present, in order to alleviate the problems caused by noisy text, robust machine translation mainly adopts the following two methods: 1) using synthetic noisy text to simulate the noise caused by OCR transcription; 2) Using the comparative learning of clean text and noise text to narrow the distribution of noise text and clean text. They failed to consider the following problems: 1) ignore the confidence information from OCR model and fail to consider the effective integration of OCR and MT system; 2) only synthetic noise is used, and the type is single, which cannot cover the actual noise type. To solve the above problems, this paper proposes an image text machine translation method based on confidence gated attention. Different from the previous cascade model methods, the proposed method can integrate the confidence of OCR character recognition into the subsequent translation framework, and can alleviate the impact of OCR recognition errors on subsequent translation. In addition, according to the characteristics of OCR transcribed text noise, this paper also designs the contrast loss of subword strength provided by the corresponding supervised text. Experiments show that the proposed method can significantly improve the translation performance of the upgraded image text translation model. (2) propose an image text machine translation method combining confidence and image information Image information can provide additional information for words with low confidence. The image encoder is introduced to encode the image information, and the image information is fused to different subwords through confidence to supplement the loss caused by the introduction of confidence. However, due to the redundant information such as background, font color and so on in the image information, it will affect the generalization performance of the model. Therefore, this paper introduces contrast learning in the image encoder to make the same characters have the same representation, so as to improve the generalization performance of the model. Experiments show that after introducing image information, the proposed method can further improve the translation performance of the upgraded image text translation model.
关键词	置信度图像文本翻译鲁棒性神经机器翻译
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/48562
专题	多模态人工智能系统全国重点实验室_自然语言处理毕业生_硕士学位论文毕业生
推荐引用方式 GB/T 7714	伍凌辉. 融合置信度的图像文本翻译方法研究[D]. 中国科学院大学. 中国科学院大学,2022.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
毕业论文-融合置信度的图像文本翻译方法研（4723KB）	学位论文		开放获取	CC BY-NC-SA