零样本中文字符识别方法研究

CASIA OpenIR > 多模态人工智能系统全国重点实验室 > 模式分析与学习

	零样本中文字符识别方法研究
	敖翔
	2024-05
页数	98
学位类型	博士
中文摘要	中文字符识别面临字符类别数量巨大、生僻字样本难以收集且类别集动态扩充等挑战。针对新类别样本缺少的问题，零样本中文字符识别方法旨在通过辅助信息识别训练中从未见过的字符。它降低了数据收集的要求，同时支持开放环境下的新字符识别。本文围绕零样本中文字符识别这一课题完成了以下创新性工作：一、提出了一种基于跨模态原型学习的零样本中文字符识别方法。受人类通过印刷模板识别未见手写字符的启发，本文提出了一种跨模态原型学习的方法，来处理联机手写轨迹和印刷模板之间的跨模态对齐问题。该方法通过不同的深度神经网络将两个模态嵌入到同一空间中，并将印刷模板视作联机手写的类原型，然后借助原型学习方式完成双模态联合训练。在测试阶段，通过印刷原型即可实现对新手写字符的识别。公开数据集上的实验结果显示，该方法在联机手写字符上表现出良好的零样本识别性能。二、提出了一种融合联机和脱机的跨域零样本字符识别框架。本方法基于跨模态原型学习，实现了统一联机与脱机手写模态的零样本字符识别框架。它联合处理联机手写轨迹、脱机手写图像和印刷模板三个模态且无需联机与脱机的逐样本对应关系，仅通过在两者之间共享同一套印刷原型实现了联合学习，基于印刷原型可以同时对联机和脱机手写字符进行识别并且可用于跨域文字识别。实验结果表明，该方法通过联机-脱机融合有效促进彼此的零样本识别能力，并在跨语言和现代到古代设定下均展示出良好的跨域零样本泛化性。三、提出了一种基于样本合成与分类器校准的零样本字符识别方法。针对未见类上的域漂移问题，本方法在已见类上训练一个以印刷字符图像为条件的样本生成器用来合成未见字符的样本，并利用这些合成样本去校准漂移的未见类原型。校准过程无需额外训练并能快速适应到未见字符上。进一步，本方法将原型分类器扩展成基于高斯密度假设的贝叶斯分类器，提出了贝叶斯分类器校准方法来适应未见类的真实特征分布，并在最大后验概率准则下完成分类。贝叶斯分类器校准相比原型分类器校准进一步提升了识别性能。特征空间中的量化分析表明经过贝叶斯分类器校准后的分布进一步缩小了与未见类真实分布的差异。
英文摘要	Chinese character recognition faces significant challenges due to the vast number of character categories, the difficulty in collecting samples of rare characters, and the dynamic expansion of the category set. To address the issue of scarce samples for new categories, zero-shot Chinese character recognition methods aim to recognize unseen characters that have no samples during training, by means of auxiliary information. They can reduce the data collection requirements and support the recognition of new characters in an open environment. Aimed for zero-shot Chinese character recognition, this thesis conducted some innovative research works as follows: 1. A cross-modal prototype learning method for zero-shot Chinese character recognition is proposed. Inspired by the human ability to recognize unseen handwritten Chinese characters via printed templates, we proposed a cross-modal prototype learning method to align between the online handwriting trajectories and the printed templates. To do this, different deep neural networks are used to embed the two modalities into a shared space, treating the printed template as the prototype of the online handwriting samples of each category. Joint traing of the two modalities are realized through the prototype learning. During the testing phase, recognition of new handwritten characters is achieved by the printed prototypes. Experiments on public datasets demonstrate promising generalization performance of this method in zero-shot online handwritten character recognition. 2. A unified cross-domain zero-shot character recognition framework integrating the online and offline modalities is proposed. This method, based on the cross-modal prototype learning, achieves unified zero-shot character recognition integrating the online and offline handwriting modalities. It jointly processes the online handwriting trajectories, the offline handwriting images, and the printed templates without the need of a one-to-one correspondence between the online and offline samples. Joint learning is realized through sharing the set of printed prototypes between the online and offline modalities, and the recognition of both online and offline handwritten characters can be conducted based on these printed prototypes. This framework can also apply to cross-domain character recognition. Experimental results demonstrate that the method effectively enhances the zero-shot recognition capability of each modality through the online-offline integration and exhibits commendable cross-domain zero-shot generalization under both the cross-language and modern-to-ancient settings. 3. A method based on sample synthesis and classifier calibration for zero-shot Chinese character recognition is proposed. To address the domain shift issue on unseen classes, this method trains a sample generator conditioned on the printed character images on seen classes to synthesize samples for unseen characters. These synthesized samples are then used to calibrate the shifted prototypes of unseen classes. The calibration process requires no additional training and can quickly adapt to unseen characters. Furthermore, this method extends the prototype classifier into a Bayesian classifier based on the Gaussian density assumption, introducing the Bayesian classifier calibration to fit into the true distribution of unseen classes and perform classification under the maximum posterior probability criterion. The Bayesian classifier calibration, compared to the prototype classifier calibration, further improves the recognition performance. Quantitative analysis in the feature space indicates that the distribution after the Bayesian classifier calibration better approximates the true distribution of the unseen classes.
关键词	中文字符识别零样本跨模态原型学习分类器校准
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/56729
专题	多模态人工智能系统全国重点实验室_模式分析与学习毕业生_博士学位论文
推荐引用方式 GB/T 7714	敖翔. 零样本中文字符识别方法研究[D],2024.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
Thesis.pdf（6485KB）	学位论文		开放获取	CC BY-NC-SA