CASIA OpenIR  > 毕业生  > 硕士学位论文
印刷体朝鲜文识别方法研究
其他题名Research of Printed Korean Character Recognition
许日俊
学位类型工学硕士
导师刘昌平
2005-05-15
学位授予单位中国科学院研究生院
学位授予地点中国科学院自动化研究所
学位专业模式识别与智能系统
关键词朝鲜文识别 字母分割 辅音 元音 识别后处理 Hangul Recognition Grapheme Segmentation Consonant Vowel Post-processing
摘要朝鲜文是一种由辅音和元音基本字母构成的文字,它跟汉字有很多相似之处,因此汉字识别中用到的一些理论也可以应用到朝鲜文识别中。朝鲜文根据元音字母类型和后辅音的有无可以分为 6 种结构,理论上可以组成 11000 多个文字。朝鲜文中普遍存在相似字,这个特点严重阻碍了朝鲜文识别技术的发展。为了减少识别文字的复杂度,本文提出了一种基于字母的识别方法。本文在粗分类候选字的基础上,利用背景细化方法分离出构成文字的基本字母,然后提取两层外围距离特征,通过神经网络和结构分析识别字母,并根据候选字的实际情况以及朝鲜文的组成特点,对朝鲜文细分类进行了研究。另外,在现有的朝鲜文单词统计表的基础上,对识别后处理进行了实验,并取得了比较好的效果。下面列出了本文的主要工作: (一) 分析朝鲜文文字结构特点,利用垂直方向、水平方向投影直方图法确定背景细化区域,通过对这些背景区域进行细化处理,得到字母之间的分割线并分离出了每个字母。 (二) 从分离出的字母提取两层外围距离特征,以这些特征向量为输入建立了三层 BP 神经网络。然后利用神经网络和结构特点识别字母,分析现有的印刷体朝鲜文识别系统给出的候选字组来判决识别文字,对经常用到的 4 种印刷体朝鲜文相似字候选组进行了识别研究。 (三) 初步地建立了一种识别后处理系统。利用双方向搜索方法,从朝鲜文单词统计表中检索主体词和附加词,并把句子中识别错误的单词修正过来,对识别系统有一定的改善作用。
其他摘要Hangul(Korean) is a language which character is composed of consonants and vowels. Since Hangul is very similar to the Chinese languages, some recognition methods applied to Chinese character recognition can be also applied to Hangul recognition. Hangul can be classified into 6 types according to the form of vowels and the existence of final-consonants, which results in over 11000 possibilities, and many of these combinations look remarkably similar. To reduce the complexity of character recognition, the approach of separating each alphabet of a character and identifying the separated alphabets independently was adopted in this thesis. Basing on existing Hangul recogntion system, background-thinning technique was proposed to separate graphemes, and then separated graphemes were recognized by the neural network classifier using peripheral feature. Finally, a character is recognized by combining recognized graphemes using the information of candidates. Furthermore, an efficient Post-processing method was proposed based on the Hangul word statistics. The main points in this thesis include: 1. By analyzing the structure of Hangul character, the horizontal and vertical projection histogram method was used to calculate the thinning area of background. Then through thining processing to the background region of character image, the segmentation-line between the alphabet was found to separate each alphabet. 2. The 3 layer BP neural network was established by training the peripheral feature vectors extracting from the alphabet image. Then consonants and vowels were recognized with neural network and structure information analysis methods, and then the similar characters were distinguished by analyzing the candidate similar Hangul character group in 4 most frequently used printed Hanguls fonts. 3. The wrongly recongized words were found and corrected by searching substantive and empty words from the Hangul word statistics with two-direction-searching method. The recognition accuracy of character classification was improved with this post-processing method.
馆藏号XWLW863
其他标识符200228014603567
语种中文
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/6893
专题毕业生_硕士学位论文
推荐引用方式
GB/T 7714
许日俊. 印刷体朝鲜文识别方法研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2005.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[许日俊]的文章
百度学术
百度学术中相似的文章
[许日俊]的文章
必应学术
必应学术中相似的文章
[许日俊]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。