基于统计部首模型的联机手写汉字识别

CASIA OpenIR > 毕业生 > 博士学位论文

	基于统计部首模型的联机手写汉字识别
其他题名	On-line Handwritten Chinese Character Recognition Based on Statistical Radical Models
	马龙龙
	2010-06-01
学位类型	工学博士
中文摘要	联机手写汉字识别技术在计算机和手持移动设备（如手机、PDA等）的汉字输入、笔输入文档分析、人机交互等领域具有广泛的应用。随着笔输入设备的普及和应用的扩展，人们对联机手写汉字识别的性能提出了更高的要求。进一步提高识别精度、减少计算量和存储空间是下一步的研究目标。基于部首的汉字识别方法长期以来吸引了广大研究者的兴趣，利用汉字的部首层次结构有助于减小字符识别器的存储空间和提高泛化性、适应性，但部首分割一直是一个难点。本文提出一种新的基于部首的联机手写汉字识别方法，该方法结合了统计方法和基于部首的结构方法的优点。主要工作包括以下几个方面: 一、创建了适合计算机识别的部首模型数据库，包括特殊部首和普通部首。为了进一步提高部首分割的效率和准确性，我们没有采用语言学定义的部首，而是重新定义易于分离的部首类别，部首的统计模型通过对训练样本中提取的部首进行聚类自动学习得到。二、针对非特殊结构类型的字符，引入了嵌套的部首分割方法。该方法利用集成分割与识别方法的思想，把部首形状信息和几何信息集成到识别框架中,在组合搜索过程中利用字符-部首的层次结构字典引导部首的分割与识别，从而提高部首分割的准确率。为克服部首之间的连笔，引入角点检测提取子笔划。在字符识别中，采用了两种不同的字典表示以及相应的不同搜索算法。实验结果表明了该方法的有效性。三、提出了基于统计分类的特殊部首检测方法。利用特殊部首和剩余部分之间的可分性以及为每一类特殊部首引入先验规则，获得多个候选部首，用19个两类支持向量机(SVM)分类器对候选部首分类判断是否特殊部首。剩余部分则用非特殊结构字符的识别方法来识别。实验结果表明了该检测方法的有效性。四、实现了非特殊结构和特殊结构字符的统一识别框架，并通过三种方法：可信区域的直接识别、混淆区域的择优识别以及错误检测的拒绝进一步改进了系统的识别性能。
英文摘要	Online handwritten Chinese character recognition (OLHCCR) is widely used in computer and hand-held devices, such as mobile phones and PDAs, for Chinese characters input, pen document analysis, and so on. To satisfy the demand of high recognition performance, researchers are working towards high accuracy recognition method with lower complexity and smaller storage space. For a long time, the radical-based character recognition method has attracted intensive interests because of its potential to utilize the hierarchical radical structure of Chinese characters to reduce the number of parameters and improve the generalization ability and adaptability. However, the segmentation of radicals from characters has long been a difficult problem. We propose a new radical-based recognition approach, which combines the merits of hierarchical structure and appearance-based statistical classification of radicals. The main contributions of this work are in the following aspects. First, we establish a radical model database from the viewpoint of computer segmentation and recognition. Rather than taking the linguistic definitions of radicals, we define radicals that are easier to segment from characters. The parameters of statistical radical models are estimated in automatic learning embedding segmentation on training character samples. Second, for characters of non-special structures, we propose a new radical-based approach based on integrated segmentation and recognition of radicals. The approach integrates appearance-based radical recognition and geometric context into a principled framework using a hierarchical character-radical dictionary to guide radical segmentation and recognition during path search. To overcome the connection of strokes between radicals, corner points are detected to extract sub-strokes. For character recognition, we use two dictionary representation schemes and accordingly different search algorithms. The effectiveness of the proposed approach has been demonstrated on Chinese characters of non-special structures. Third, we propose a statistical-classification-based method for detecting special radicals from special structures. We design 19 binary support vector machine (SVM) classifiers for classifying candidate radicals (groups of strokes), which are obtained based on eligibility of special radical class and separation between radical and the remaining part. After detecting the special radical, the remaining part is assumed to be non-special s...
关键词	联机手写汉字识别层次结构部首过分割部首识别路径搜索特殊部首检测 On-line Handwritten Chinese Character Recognition Hierarchical Structure Radical Over-segmentation Radical Recognition Path Search Special Radical Detection
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6280
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	马龙龙. 基于统计部首模型的联机手写汉字识别[D]. 中国科学院自动化研究所. 中国科学院研究生院,2010.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20061801462805（3218KB）			限制开放	CC BY-NC-SA