CASIA OpenIR  > 毕业生  > 硕士学位论文
Alternative TitleResearch of Form and Handwritten Text Processing
Thesis Advisor戴汝为
Degree Grantor中国科学院自动化研究所
Place of Conferral中国科学院自动化研究所
Degree Discipline模式识别与智能系统
Abstract汉字识别技术要达到实用化,一方面要致力于特征提取和识别方法的研 究以提高单字识别率,另一方面应解决好输入文本版面格式的理解及字符切 分问题.长期以来关于汉字识别的研究主要集中于前者,新的方法层出不穷, 单字识别技术日臻成熟.相比之下后者的重要性远未得到应有的重视,因而 这方面的不足日趋明显,今天已成为影响汉字识别实用化的主要原因.从这 一点出发,本文将后者作为研究的重点,对前者也进行了有益的探讨. 本文的主要工作有: 1.对表格这一广泛使用的特殊文本的理解问题做了深入细致的研究,提 出了包括表格学习、表格识别、表格对准在内的一整套表格处理算法,并开 发出一个实用的交互式表格信息处理系统. 2.对字符切分问题进行了系统的研究,提出了一种基于规则的分裂一合 并算法用以切分手写汉字文本,对这类文本的随意性有一定的适应能力,并 有一定的抗噪声能力.该方法在"软件工程手写文字图表识别系统"中使用, 取得了较好的效果. 3.对各类特征的选择和提取进行了全面的总结,给出了一种提取笔段的 方法,其中包括一种新的确定拐点的方法,能够一次较准确地找到拐点的位 置,不需要再进行分裂-合并处理. 4.分析总结了汉字的各类识别方法,对其中较成功的一种——松弛法进 行了较细致的研究和综述,并以笔段为基元,用松弛匹配法进行了手写汉字 识别实验. 5.完成了"软件工程手写文字图表识别系统"界面的设计实现.
Other AbstractTo bring the Chinese character recognition technique into practical use, on the one hand great efforts should be made on feature extraction and classification to raise the recognition rate of isolated characters, on the other hand the problems of the understanding of the input document's layout and character segmentation should be well solved. For a long time the research of Chinese character recognition has been focused oil the former with new methods coming thick and fast, and the technique of the recognition of isolated characters is becoming mature. By comparison the latter has not got as much attention as it ought to. Thus its faults have become prominent and become the main factors that affect the application of the Chinese character recognition technique. Under this consideration, the author focused the main work of this thesis on the latter, and profitable research had also been done on the former. The main works of the thesis are the following: 1.Meticulous and in-depth work has been done on the understanding of the form, a special but widely used kind of document. An integrate set of algorithms of form processing including form learning, form recognition and form registration are proposed. A practical interactive form processing system is developed. 2.A systematic study is done on character segmentation. A rule-based, split rnerge algorithm is proposed to segment handwritten texts. It is adaptive to the variation of such texts and is non-sensitive to noises. The algorithm is used in the Software Engineering Handwritten Character and Form Recognition System and has got fairly good results. 3.A comprehensive review of feature selection and extraction is made. A sub- stroke extraction method is proposed, including a new corner points identification method which can find the corner points on the skeleton of a Chinese character correctly at one time and need not further split-merge processing. 4.A discussion and summary is made on the classification methods. A survey of the relaxation method is made specially. An experiment on handwritten Chinese character recognition using sub-strokes as elements and relaxation method is also done. 5.A friendly interface is designed and developed for the Software Engineering Handwritten Character and Form Recognition System.
Other Identifier425
Document Type学位论文
Recommended Citation
GB/T 7714
肖国芳. 表格及手写文本处理技术研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,1997.
Files in This Item:
There are no files associated with this item.
No data!
