CASIA OpenIR  > 毕业生  > 硕士学位论文
印刷体朝鲜文识别方法研究
Alternative TitleResearch of Printed Korean Character Recognition
许日俊
Subtype工学硕士
Thesis Advisor刘昌平
2005-05-15
Degree Grantor中国科学院研究生院
Place of Conferral中国科学院自动化研究所
Degree Discipline模式识别与智能系统
Keyword朝鲜文识别 字母分割 辅音 元音 识别后处理 Hangul Recognition Grapheme Segmentation Consonant Vowel Post-processing
Abstract朝鲜文是一种由辅音和元音基本字母构成的文字,它跟汉字有很多相似之处,因此汉字识别中用到的一些理论也可以应用到朝鲜文识别中。朝鲜文根据元音字母类型和后辅音的有无可以分为 6 种结构,理论上可以组成 11000 多个文字。朝鲜文中普遍存在相似字,这个特点严重阻碍了朝鲜文识别技术的发展。为了减少识别文字的复杂度,本文提出了一种基于字母的识别方法。本文在粗分类候选字的基础上,利用背景细化方法分离出构成文字的基本字母,然后提取两层外围距离特征,通过神经网络和结构分析识别字母,并根据候选字的实际情况以及朝鲜文的组成特点,对朝鲜文细分类进行了研究。另外,在现有的朝鲜文单词统计表的基础上,对识别后处理进行了实验,并取得了比较好的效果。下面列出了本文的主要工作: (一) 分析朝鲜文文字结构特点,利用垂直方向、水平方向投影直方图法确定背景细化区域,通过对这些背景区域进行细化处理,得到字母之间的分割线并分离出了每个字母。 (二) 从分离出的字母提取两层外围距离特征,以这些特征向量为输入建立了三层 BP 神经网络。然后利用神经网络和结构特点识别字母,分析现有的印刷体朝鲜文识别系统给出的候选字组来判决识别文字,对经常用到的 4 种印刷体朝鲜文相似字候选组进行了识别研究。 (三) 初步地建立了一种识别后处理系统。利用双方向搜索方法,从朝鲜文单词统计表中检索主体词和附加词,并把句子中识别错误的单词修正过来,对识别系统有一定的改善作用。
Other AbstractHangul(Korean) is a language which character is composed of consonants and vowels. Since Hangul is very similar to the Chinese languages, some recognition methods applied to Chinese character recognition can be also applied to Hangul recognition. Hangul can be classified into 6 types according to the form of vowels and the existence of final-consonants, which results in over 11000 possibilities, and many of these combinations look remarkably similar. To reduce the complexity of character recognition, the approach of separating each alphabet of a character and identifying the separated alphabets independently was adopted in this thesis. Basing on existing Hangul recogntion system, background-thinning technique was proposed to separate graphemes, and then separated graphemes were recognized by the neural network classifier using peripheral feature. Finally, a character is recognized by combining recognized graphemes using the information of candidates. Furthermore, an efficient Post-processing method was proposed based on the Hangul word statistics. The main points in this thesis include: 1. By analyzing the structure of Hangul character, the horizontal and vertical projection histogram method was used to calculate the thinning area of background. Then through thining processing to the background region of character image, the segmentation-line between the alphabet was found to separate each alphabet. 2. The 3 layer BP neural network was established by training the peripheral feature vectors extracting from the alphabet image. Then consonants and vowels were recognized with neural network and structure information analysis methods, and then the similar characters were distinguished by analyzing the candidate similar Hangul character group in 4 most frequently used printed Hanguls fonts. 3. The wrongly recongized words were found and corrected by searching substantive and empty words from the Hangul word statistics with two-direction-searching method. The recognition accuracy of character classification was improved with this post-processing method.
shelfnumXWLW863
Other Identifier200228014603567
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/6893
Collection毕业生_硕士学位论文
Recommended Citation
GB/T 7714
许日俊. 印刷体朝鲜文识别方法研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2005.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[许日俊]'s Articles
Baidu academic
Similar articles in Baidu academic
[许日俊]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[许日俊]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.