复杂场景图像中的文字检测方法研究 | |
黄燃东 | |
2021-05-27 | |
页数 | 118 |
学位类型 | 博士 |
中文摘要 | 场景文字检测旨在精确检测自然场景图像中的文字区域,通常作为场景文 1、针对场景文字检测的假阳性检测问题,本文提出了一种聚焦特征及分类 2、针对场景文字检测的训练样本不均衡问题,本文提出了类平衡一次方 3. 针对任意形状文字检测方法复杂和低效率的问题,本文提出了一种基于 |
英文摘要 | Scene text detection aims to accurately detect text regions in natural scene images and is usually used as a pre-step for scene text recognition. At present, there are still many challenging difficulties in scene text detection, such as the variation of text scale, orientation, shape, aspect ratio and the unusual complexity of image background. To overcome these difficulties, it is necessary to study the extraction methods of robust text features and design methods of concise and efficient detection framework. In recent years, convolutional neural network has effectively improved the ability of scene text detection to cope with various challenges. This dissertation is based on convolutional neural network to conduct research and its main contributions are as follows: 1. To solve the problem of false positives in scene text detection, this dissertation proposes a Features and Score Map Focused Text Attention Hybrid Mechanism (FSFTAHM).
3. To solve the complexity and low efficiency problem of arbitrary shape text detection methods, this dissertation proposes a parallel regression and segmentation based text detection method, which aims at parallelly regressing circumscribed horizontal rectangles of text instances and segmenting arbitrary shape text. This method consists of four modules: convolutional feature extraction and fusion, network outputs, post-processing and Feature Semantic Enhancement Mechanism (FSEM). Convolutional feature extraction and fusion is used to extract and merge image convolutional features. The network outputs contain score map branch, rectangle branch and Text Center-ness (TC) branch. Score map branch is used to parallelly classify and segment text instances. Rectangle branch aims at regressing circumscribed horizontal rectangles of text instances. TC is used to avoid incomplete segmentation of text and to enhance the ability of features to distinguish text from background. Post-processing includes two testing manners, locality-aware non-maximum suppression and rectangle projection. FSEM further enhances the ability of the features to distinguish text from background. This proposed method builds a more concise model for arbitrary shape text detection, which outperforms most text detection methods in both accuracy and speed. |
关键词 | 场景文本检测,注意力机制,训练样本不均衡,并行回归分割,卷积 神经网络 |
语种 | 中文 |
文献类型 | 学位论文 |
条目标识符 | http://ir.ia.ac.cn/handle/173211/44557 |
专题 | 毕业生_博士学位论文 |
推荐引用方式 GB/T 7714 | 黄燃东. 复杂场景图像中的文字检测方法研究[D]. 中国科学院自动化研究所. 中国科学院大学,2021. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | ||
Thesis-最终版-黄燃东-上传至答辩(21972KB) | 学位论文 | 限制开放 | CC BY-NC-SA |
个性服务 |
推荐该条目 |
保存到收藏夹 |
查看访问统计 |
导出为Endnote文件 |
谷歌学术 |
谷歌学术中相似的文章 |
[黄燃东]的文章 |
百度学术 |
百度学术中相似的文章 |
[黄燃东]的文章 |
必应学术 |
必应学术中相似的文章 |
[黄燃东]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论