基于对抗机制的手写文本图像生成方法研究

CASIA OpenIR > 毕业生 > 硕士学位论文

	基于对抗机制的手写文本图像生成方法研究
	李硕
	2023-05
页数	74
学位类型	硕士
中文摘要	文本作为交流沟通的重要形式之一，在我们日常工作和生活中发挥着重要作用。随着深度学习技术的不断发展，依托于海量数据的深度模型在文档分析、文本识别以及笔迹模仿等领域大放异彩。其中，手写中文文本图像生成与识别作为计算机视觉领域的一个重要研究方向，其具有广泛的应用前景。然而，手写中文文本图像生成与识别也面临着诸多挑战。首先，汉字字符存在数目庞大且书写风格多样化的特点，这严重制约了手写文本识别准确率的提高。其次，手写字体风格的多变性以及不同字体间的拓扑差异也会进一步阻碍生成接近真实书写的手写汉字文本。针对上述难点，本文基于生成对抗网络，开展手写中文文本图像生成研究。论文的主要贡献包括以下两个方面： 1）提出了一种基于部件分解和细粒度监督的手写文本行生成方法。该方法设计了一个文本行生成模型HCT-GAN，通过引入适用于序列汉字生成任务的内容编码模块和细粒度的内容识别网络，引导生成可读的手写文本图像。在内容编码模块方面，本文借助汉字部件和结构等先验知识，提出中文文本编码器（CTE）。该编码器通过学习汉字间共享部件的潜在内容表示编码中文文本。基于部件重用的编码方式提升了内容编码的信息密度，也间接拓展了训练样本。此外，在细粒度监督方面，本文还提出了序列识别模块（SRM）和空间感知模块（SPM），在部件层级引导生成更加清晰锐利的笔迹，并提供序列级约束和自适应的空间相关性约束，以便生成具有复杂拓扑结构的字符。得益于以上建模，提出的模型足以生成具有任意长度且风格多样的手写中文文本行图像。此外，生成结果还可用于手写文本的数据增广，以提升手写文本识别（HTR）准确率。大量的实验结果显示，该模型在生成手写中文文本行方面取得了先进性能。 2）提出了一种基于多模块联合的手写中文文本笔迹模仿方法。针对汉字单字符风格迁移难以适用于笔迹模仿的问题，本文提出了一个有效的生成模型CG-GAN。该模型将汉字图像生成工作从目前广泛采用的单字符生成拓展至序列汉字生成，即直接生成手写中文文本行。在风格提取方面，为了编码指定的风格特征，该模型引入了风格编码网络，将字体风格、结构关联性以及疏密特征隐式编码在风格向量中。在文本疏密特征提取方面，为了编码序列文本的疏密特征，该模型引入可训练的序列文本疏密编码器，通过风格特征调整内容编码，将字符的宽度和间隔等疏密特征显式地编码到内容嵌入中。得益于以上建模，提出的模型可同时关注参考笔迹的字体风格、相邻字符之间的结构关联性以及疏密特征，实现了近乎以假乱真的笔迹模仿。实验结果显示，该模型在笔迹模仿方面具有优越的性能。
英文摘要	Text communication serves as an essential form of communication in our daily work and life, playing a crucial role. With the continuous development of deep learning technology, Deep neural networks（DNN） rely on big data to shine brilliantly in fields such as document analysis, text recognition, and handwriting imitation. Among them, handwriting text image analysis and recognition is an important research topic in the field of computer vision, with broad application prospects but also facing many challenges. On the one hand, the vast number of Chinese characters and their diverse writing styles severely restrict the accuracy of handwriting recognition. On the other hand, the variability of handwriting styles and the topological differences between different fonts make it extremely difficult to generate realistic handwritten Chinese texts. To address the aforementioned challenges, the paper conducts research on the generation of handwritten Chinese text images based on generative adversarial networks (GANs). The main contributions can be summarized as follows: 1）A method for generating handwritten text lines based on component decomposition and fine-grained supervision is proposed. The method proposes a text line generation model called HCT-GAN, which introduces a content encoding module suitable for sequential Chinese character generation tasks and a fine-grained content recognition network to guide the generation of readable handwritten text images. Specifically, the Chinese text encoder (CTE) is proposed by leveraging prior knowledge such as Chinese character components and structures, which encodes Chinese text by learning the latent content representation of shared components among Chinese characters. The encoding approach based on component reuse enhances the information density of content encoding and indirectly expands the training samples. In addition, we also propose a sequence recognition module (SRM) and a spatial perception module (SPM) for fine-grained supervision, which guides the generation of clearer and sharper strokes at the component level, and provides sequence-level constraints and adaptive spatial relevance constraints for generating characters with complex topology. Benefiting from component reuse and fine-grained supervision, our HCT-GAN is capable of generating images of handwritten Chinese text lines of arbitrary length. Experimental results show that the proposed HCT-GAN achieves state-of-the-art performance in handwritten Chinese text-lines generation. Therefore, our model can generate new handwritten text images with specified contents and various styles to perform data augmentation, thereby boosting handwritten text recognition (HTR). 2) A handwriting imitation method for handwritten Chinese text based on multi module collaboration is proposed. Regarding the issue that it is difficult to use character style transfer for handwriting imitation, An effective generative model CG-GAN is proposed. The model expands the work of Chinese character image generation from the widely used single character generation to sequential Chinese character generation, which directly generates handwritten Chinese text lines. In terms of style extraction, in order to encode the specified style features, the model introduces a style encoding network, which implicitly encodes font style, structural correlation, and density features in the style vector. In terms of text density feature extraction, in order to encode the density features of sequential texts, this model introduces a trainable sequential text density encoder, adjusts the content embedding through style features, and explicitly encodes density features such as character width and spacing into content embedding. Benefiting from joint adjustment of multiple modules, the proposed model can simultaneously focus on the font style of the reference handwriting, the structural correlation between adjacent characters, and the density features, achieving almost true handwriting imitation. The experimental results show that the model has superior performance in handwriting imitation.
关键词	生成模型，对抗性学习，手写文本图像生成，数据增广，中文笔迹模仿
语种	中文
七大方向——子方向分类	文字识别与文档分析
国重实验室规划方向分类	视觉信息处理
是否有论文关联数据集需要存交	否
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/52079
专题	毕业生_硕士学位论文
推荐引用方式 GB/T 7714	李硕. 基于对抗机制的手写文本图像生成方法研究[D],2023.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
硕士毕业论文_李硕.pdf（6840KB）	学位论文		限制开放	CC BY-NC-SA