CASIA OpenIR  > 毕业生  > 博士学位论文
联机中文手写文本识别方法研究
其他题名Methods for Online Chinese Handwritten Text Recognition
王大寒
2012-05-31
学位类型工学博士
中文摘要随着手机、平板电脑、电子白板、数码笔等移动终端设备的进步和广泛应用,联机手写输入得到了越来越多的应用和关注,这些手写设备的应用也生成了大量的联机手写文档,对联机手写文档的分析和识别对于手写笔记文档的电子化、分析和检索具有重要的意义,同时,实时快速的手写文本输入方法也成为当前的应用需求之一。针对当前广泛的应用背景,本文研究联机中文手写文本识别方法,旨在融合手写文本识别中的上下文信息,提高文本识别的识别率,同时针对手写文本输入的需求,研究联机手写文本实时识别方法。本文主要工作和贡献如下: (1) 为了支持无约束中文手写识别方面的研究,收集整理了一个大规模的联机手写数据库:CASIA-OLHWDB,该数据库在2011年的国际文档分析与识别会议上发布,免费供学术界使用。为处理联机手写数据,本文研制和设计了联机手写数据库标定工具,对联机数据库进行字符级别的标定。该数据库同时包括单字(DB1.0~1.2)和文本数据(DB2.0~2.2),由1020个人书写,单字样本库包含3,912,017个样本(7356类),文本样本库包含5,092个页面文档(包含52,221个文本行,共1,348,969个字)。该数据库可用于文档分割、手写字符识别、文本行识别、文档检索、书写人自适应和笔迹鉴别等多方面的研究。 (2) 为了在手写文本行识别中更好地融合单字分类器和上下文信息,本文比较了多种分类器置信度转换方法,提出了两种改进的类别相关置信度参数估计,并提出了在字符串级别学习置信度参数的方法。实验表明,通过基于最小分类错误(Minimum Classification Error,MCE)的字符串级别的置信度参数估计,能有效提高文本行识别正确率。 (3) 为满足当前手写设备进行手写输入的需求,提出了一种中文手写句子实时识别方法,并实现该系统。该方法的核心思想是切分-识别候选网格在书写过程中进行动态更新。该方法允许用户连续书写并在书写过程中实时识别,提高了输入速度。同时,充分利用了语言上下文,相比单字识别能得到更高的识别正确率。实验结果证实了本方法的有效性和实用性。
英文摘要With the advances and increasing use of mobile devices such as cell phone, tablet PC, electronic whiteboards and digital pen, online handwriting based text input is receiving new interests. Efficient techniques are needed for processing, recognition and retrieval for the large volume of digital ink documents produced by the pen input devices. Online handwritten text recognition is at the heart of document analysis and recognition, and is expected to recognize texts in real-time during writing. To fulfill these needs, this dissertation investigates the online handwritten text recognition by integrating multiple contexts and particularly, considers the method for real-time Chinese handwritten sentence recognition. The major contributions of this dissertation are as follows: (1) To support the research of unconstrained Chinese handwriting recognition, we collected and annotated a large database of online Chinese handwriting: CASIA-OLHWDB, which had been released in ICDAR2011 for research for free. The handwritten samples were produced by 1,020 writers using Anoto pen on papers, such that both online (trajectories) and offline data (scanned images) were obtained. An efficient annotation tool was developed for processing the online data. CASIA-OLHWDB includes both isolated characters (DB1.0-1.2) and handwritten texts (continuous scripts, DB2.0-2.2). The datasets of isolated characters contain 3,912,017 samples of 7,356 classes, and the datasets of texts include 5,092 pages (including 52,221 text lines, 1,348,969 characters in total). The database can be used for typical research tasks of handwritten document analysis such as handwritten document segmentation, handwritten character recognition, text line recognition, handwritten document retrieval, writer adaptation, and writer identification. (2) We investigated the methods for confidence transformation (CT) of classifier outputs in handwritten text line recognition. On comparing the performance of class-dependent and class-independent CT, we propose two regularized class-dependent CT methods, and particularly, a string-level confidence learning method under the Minimum Classification Error (MCE) criterion. In experiments of online Chinese handwritten text recognition, the string-level confidence learning method was shown to effectively improve the recognition performance. (3) To fulfill the increasing needs of sentence-based handwritten text input, we propose an approach for real-time recognition of online...
关键词联机手写数据库 手写文本识别 置信度转换 字符串级别置信度参数学习 手写句子实时识别 Online Handwriting Database Handwritten Text Recognition Confidence Transformation String-level Confi Dence Parameters Learning Real-time Handwritten Sentence Recognition
语种中文
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/6467
专题毕业生_博士学位论文
推荐引用方式
GB/T 7714
王大寒. 联机中文手写文本识别方法研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2012.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
CASIA_20081801462806(32548KB) 暂不开放CC BY-NC-SA
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[王大寒]的文章
百度学术
百度学术中相似的文章
[王大寒]的文章
必应学术
必应学术中相似的文章
[王大寒]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。