CASIA OpenIR  > 毕业生  > 博士学位论文
基于迁移学习的鲁棒性图像文字识别
张亚萍
Subtype博士
Thesis Advisor刘文举
2020-05-29
Degree Grantor中国科学院大学
Place of Conferral中国科学院大学
Degree Discipline模式识别与智能系统
Keyword光学字符识别,文字识别,领域自适应,迁移学习
Abstract

得益于深度学习的发展,图像文字识别技术取得了长足的进步。然而,真实环境中图像文字具有复杂的动态变化性:在手写场景中,不同书写者字体风格迥异;在自然场景中,图像背景复杂多变,这些因素不可避免导致了数据分布的变化,从而引起识别模型在实际应用中的性能退化。该问题的根源在于传统的机器学习方法通常假设训练和测试数据服从相同的分布,无法应对真实环境的动态性所带来的数据分布变化。而迁移学习是一种能够从不同分布的数据中自适应构建学习模型的方法。因此,本文主要研究面向鲁棒性图像文字识别的自适应迁移学习方法,结合图像文字识别领域知识,重点突破迁移学习在图像文字识别领域的应用挑战。论文的主要工作和创新点归纳如下:(1) 针对现有迁移学习方法主要通过特征变换直接进行不同领域特征空间对齐,未能考虑到文字图像先验知识的问题,提出了一种融合先验知识的对抗特征学习方法,自动挖掘不同风格手写体之间共享的印刷体先验知识,指导模型自适应学习书写者无关的领域不变高层特征。在公开数据集上的实验结果表明,该方法取得了最佳性能,并展现了在有限训练数据下对手写风格变化的鲁棒性。(2) 针对基于高层特征空间对齐的迁移学习方法通常未能考虑迁移过程中的语义一致性问题,提出了一种新颖的语义一致的双向对抗无监督领域自适应方法,联合底层像素空间和高层特征空间的知识迁移,可视化迁移过程,保证迁移过程中的语义一致性,进而确保有效知识迁移。在多个公开数据集上的实验结果表明,该方法不仅在无监督的跨域字符识别任务上取得了最佳性能,同时还从定性和定量的角度可视化了迁移过程中的语义一致性。(3) 针对全局特征变换方式忽略了图像文字序列中字符级别的细粒度信息,无法有效进行可变长度的图像文字序列知识迁移的问题,提出了一种序列到序列的局部域适应方法,基于注意力机制自适应进行局部细粒度字符特征迁移,将迁移学习成功应用到序列级别的图像文字识别中,有效实现序列信息的迁移。在多个公开数据集上的实验结果表明,该方法具有很强的扩展性,可以处理到不同场景下的领域偏差问题,在场景文本,手写文本,以及手写数学表达式等具有不同形式领域偏差的数据集下均取得了一致性的性能提升。(4) 针对复杂场景下图像文字序列迁移不充分的问题,尤其是面临不规则图像文字识别,以及领域差异相对较大的跨域迁移任务,引入空间正则变换模块,自适应对空间不规则图像文字进行几何形状的正则化,减少由几何形状仿射变化带来的差异,全面考虑了图像文字的粗粒度全局背景差异以及细粒度局部字符差异,进一步提出了全局粗粒度和局部细粒度的不同粒度联合域适应方法,实现了复杂场景下的序列迁移。实验结果表明,该方法在不同复杂度的图像文字识别迁移任务上,都可以取得更高的性能提升。特别地,在领域差异较大的合成场景文本到手写场景文本的迁移任务上,所提方法可以获得绝对十个百分点的性能提升。

Other Abstract

Deep learning methods have achieved remarkable results on text image reading. However, it remains challenging to build a robust text recognizer that can handle varying data in new scenarios effectively, due to the inevitable domain shift when the actual data is encountered at ``test time". In real scene, the text data distribution tends to be changed by multiple factors, such as, the different appearances in natural scene texts, various handwriting styles in handwritten texts, and even diverse structures in mathematical expressions.
Such domain shifts causing significant performance drop have been observed in many realistic applications. One intuitive and effective solution to this problem is to collect large-scale annotated text images, while they are often extremely expensive and cannot cover all diversity. Therefore, it is highly desirable to develop an algorithm to adapt text image recognition models to a new domain that is visually different from the source training domain. An appealing alternative is to take advantage of the easily available data from relevant domains to reduce domain shifts. Transfer learning has been developed to use the data coming from different distributions to reduce the domain shift. Therefore, this thesis focuses on the research of adaptive transfer learning methods for robust image text recognition. The main contributions of this thesis are as follows:
  (1) A novel adversarial feature learning model is proposed to incorporate the prior knowledge of printed data to improve the performance of handwriting character recognition on limited training data, which could exploit writer-independent high-level semantic features automatically, which in turn alleviates the large variance of handwriting styles for handwritten character recognition. It concentrates the strengths of discriminative model and generative model, and therefore could make better classification.
  (2) Most existing transfer learning methods are developed to align the high-level feature-space distribution between the source and target domains, while neglecting the semantic consistency and low-level pixel-space information. To solve this problem, a novel bidirectional adversarial domain adaptation method is proposed to simultaneously adapt the pixel-level and feature-level shifts with semantic consistency. To keep semantic consistency, a soft label based semantic consistency constraint is proposed, which takes advantage of the well-trained source classifier during bidirectional adversarial mappings. Furthermore, the semantic consistency has been first analyzed during the domain adaptation with regard to both qualitative and quantitative evaluation. 
  (3) Aiming at the problem that the global domain adaptation methods, which ignore the variable-length fine-grained character information in text images, might struggle when dealing sequence-like text images, a novel sequence-to-sequence domain adaptation network is proposed to learn “where to adapt” and “how to align”the sequential image from fine-grained local level. The key idea is to mine the local regions that contain characters adaptively via attention mechanism, and focus on aligning them across domains effectively through a novel gated attention similarity unit.
  (4) Aiming at the problem of insufficient knowledge transfer of sequence image text in complex scenes, especially facing irregular image text recognition and cross-domain migration tasks with relatively large domain differences, a spatial normalization transformation module is introduced to adaptively reduce the spatial distortions in irregular text images, which makes the model robust and could be generalized to more complex scenes. And then two joint domain adaptation modules are designed to alleviate the domain shift at both the global-level and local-level, where they collaboratively contribute to guiding model find the domain-invariant representations more effectively.
 

Pages140
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/39228
Collection毕业生_博士学位论文
Recommended Citation
GB/T 7714
张亚萍. 基于迁移学习的鲁棒性图像文字识别[D]. 中国科学院大学. 中国科学院大学,2020.
Files in This Item:
File Name/Size DocType Version Access License
Thesis-v1.10.pdf(4419KB)学位论文 限制开放CC BY-NC-SA
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[张亚萍]'s Articles
Baidu academic
Similar articles in Baidu academic
[张亚萍]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[张亚萍]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.