CASIA OpenIR  > 毕业生  > 硕士学位论文
多字体大字符集印刷体字符识别
高涛
Subtype工学硕士
Thesis Advisor刘迎建
1999-06-01
Degree Grantor中国科学院自动化研究所
Place of Conferral中国科学院自动化研究所
Degree Discipline模式识别与智能系统
Abstract近十年来,OCR技术研究得到迅速发展,而且已被实用化到商业产品。然 而,实际需求对之的要求也随日剧增。这不但表现在要求识别系统的识别率能 进一步提高,还对系统的识别范围及字体变化提出扩充。例如:在一个非常有 前景的应用即个人名片管理中,就要求识别字符集的规模在1万左右,字体有 十几种变化,传统的OCR技术难以解决这样的问题。因而在此背景下,本文利 用可增长自组织神经网络模型,以及几种具有互补性的字符特征,建立一种识 别系统,用于解决多体、大字符集字符识别问题。以下是论文的主要内容。 首先就模式识别、神经网络及字符识别的研究发展历程和现状作了一般性 综述,提出本文的研究背景和解决问题的思路。然后详细介绍了在字符识别的 预处理、特征提取、分类器设计及语言后处理等过程中应用到的一些关键技术, 并指出现存的不足之处。随后从生理依据、网络结构及权值修正等方面论述了 Kohonan的自组织模型,在此基础上介绍了LVQ有监督学习算法及其扩展算法。 最后介绍了本文所建立的识别系统,给出实验结果数据,从而得到该系统为解 决多体、大字符集字符识别有效方法之一的结论。 依据本文所建立的识别核心已应用在汉王公司的名片识别和管理产品中, 初步测试和使用的反馈信息表明,该核心已达到实用化水平。
Other AbstractThe research and development of OCR has stepped forward vastly in this decent. Many commercial products have also been brought to market. However, social backgrounds urge more achievements in OCR technology. They do not require more improvements on recognition rate only, but also need enlargement on the set and font of recognized symbols. For example, business card recognition covers about 10000 characters and more than 10 fonts, which will be one of prospective commercial filed. Current OCR products can not deal with this problem yet, so we set up a recognition system with using growing self-organizing neural network and several complementary character feature. The main contents of this paper are as follows. Firstly, based on the research development of pattern recognition and neural network and character recognition, the background of the work in this paper and general idea to perform have been introduced. Secondly, some important techniques used in character recognition have been provided in details, such as pre-processing, feature extracting, classifier designing and language post-processing. Following, selforganizing neural network are analyzed from the points of its biological basement, topological structure, learning process. Moreover, a supervised learning algorithm LVQ and its adaptive algorithms have been gave too. At the end, a recognition system has been set up. Experiment result data is provided at the same time, so that a conclusion is made that an effective path to solve the problem has been found. According to the method in this paper, a recognition core algorithm has been using in the Business Card Management, one of leading products of Hanwang Company. It receives good evaluation from user.
shelfnumXWLW552
Other Identifier552
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/7285
Collection毕业生_硕士学位论文
Recommended Citation
GB/T 7714
高涛. 多字体大字符集印刷体字符识别[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,1999.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[高涛]'s Articles
Baidu academic
Similar articles in Baidu academic
[高涛]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[高涛]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.