CASIA OpenIR  > 毕业生  > 硕士学位论文
多字体大字符集印刷体字符识别
高涛
学位类型工学硕士
导师刘迎建
1999-06-01
学位授予单位中国科学院自动化研究所
学位授予地点中国科学院自动化研究所
学位专业模式识别与智能系统
摘要近十年来,OCR技术研究得到迅速发展,而且已被实用化到商业产品。然 而,实际需求对之的要求也随日剧增。这不但表现在要求识别系统的识别率能 进一步提高,还对系统的识别范围及字体变化提出扩充。例如:在一个非常有 前景的应用即个人名片管理中,就要求识别字符集的规模在1万左右,字体有 十几种变化,传统的OCR技术难以解决这样的问题。因而在此背景下,本文利 用可增长自组织神经网络模型,以及几种具有互补性的字符特征,建立一种识 别系统,用于解决多体、大字符集字符识别问题。以下是论文的主要内容。 首先就模式识别、神经网络及字符识别的研究发展历程和现状作了一般性 综述,提出本文的研究背景和解决问题的思路。然后详细介绍了在字符识别的 预处理、特征提取、分类器设计及语言后处理等过程中应用到的一些关键技术, 并指出现存的不足之处。随后从生理依据、网络结构及权值修正等方面论述了 Kohonan的自组织模型,在此基础上介绍了LVQ有监督学习算法及其扩展算法。 最后介绍了本文所建立的识别系统,给出实验结果数据,从而得到该系统为解 决多体、大字符集字符识别有效方法之一的结论。 依据本文所建立的识别核心已应用在汉王公司的名片识别和管理产品中, 初步测试和使用的反馈信息表明,该核心已达到实用化水平。
其他摘要The research and development of OCR has stepped forward vastly in this decent. Many commercial products have also been brought to market. However, social backgrounds urge more achievements in OCR technology. They do not require more improvements on recognition rate only, but also need enlargement on the set and font of recognized symbols. For example, business card recognition covers about 10000 characters and more than 10 fonts, which will be one of prospective commercial filed. Current OCR products can not deal with this problem yet, so we set up a recognition system with using growing self-organizing neural network and several complementary character feature. The main contents of this paper are as follows. Firstly, based on the research development of pattern recognition and neural network and character recognition, the background of the work in this paper and general idea to perform have been introduced. Secondly, some important techniques used in character recognition have been provided in details, such as pre-processing, feature extracting, classifier designing and language post-processing. Following, selforganizing neural network are analyzed from the points of its biological basement, topological structure, learning process. Moreover, a supervised learning algorithm LVQ and its adaptive algorithms have been gave too. At the end, a recognition system has been set up. Experiment result data is provided at the same time, so that a conclusion is made that an effective path to solve the problem has been found. According to the method in this paper, a recognition core algorithm has been using in the Business Card Management, one of leading products of Hanwang Company. It receives good evaluation from user.
馆藏号XWLW552
其他标识符552
语种中文
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/7285
专题毕业生_硕士学位论文
推荐引用方式
GB/T 7714
高涛. 多字体大字符集印刷体字符识别[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,1999.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[高涛]的文章
百度学术
百度学术中相似的文章
[高涛]的文章
必应学术
必应学术中相似的文章
[高涛]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。