Knowledge Commons of Institute of Automation,CAS
图像与视频中的文本检测与识别方法研究 | |
冯伟 | |
2021-06 | |
页数 | 124 |
学位类型 | 博士 |
中文摘要 | 近些年来,随着互联网的广泛使用,大量的自然场景图像和视频通过网络 |
英文摘要 | In recent years, with the wide use of the Internet, a large number of scene images and videos spread through the Internet. Texts in images and videos can help understand, analyze and retrieve images and videos quickly and effectively. Compared with texts in scanned images, scene texts have higher diversity and uncertainty, due to the complex image background, the change of image resolution, illumination and perspective. This dissertation studies technology for text detection and recognition in scene images and videos, and considers common quadrilateral texts in images, arbitrary shaped texts, and texts in videos. The main contributions of this dissertation are summarized as follows. 1. A method for quadrilateral scene text detection with recurrent instance segmentation is proposed. To avoid the adhesion problem in quadrilateral texts, we propose a quadrilateral text detection method with recurrent instance segmentation. A fully convolution network is used to classify text and non-text regions, and then a recurrent neural network uses the features extracted by the fully convolution network to detect and segment a text instance at each time step. As this method adopts the idea of instance segmentation to detect texts, it can effectively solve the adhesion problem. Experimental results show that the proposed method achieves competitive results on two quadrilateral scene text datasets. 2. A bottom-up method for end-to-end arbitrary shaped text spotting is proposed. In this method, the text detector uses a series of rotated squares to describe the shape of the text, and aggregates multiple rotated squares to get the final bounding box. Then a novel operator RoISlide is used to extract the arbitrary shaped text region from the feature map by affine transformation of the detected rotated squares. On the basis of the features extracted by RoISlide, a convolutional neural network and connectionist temporal classification based text recognizer are used to recognize the text. The proposed method achieves state-of-the-art performance on two arbitrary shaped text datasets, and achieves competitive results on one quadrilateral text dataset. 3. A residual dual scale method fusing bottom-up and top-down processing is proposed to scene text spotting. In the method, the bottom-up detector uses a series of rotated squares to describe the shape of texts, the top-down detector uses the minimum enclosing rotated rectangle to represent the region of interest of the text, and the final bounding box is determined by 4. A semantic-aware video text detection method is proposed. Specifically, a character center segmentation branch is used to extract semantic features, and encode the category and position of characters. Then a novel appearance-semantic-geometry descriptor is used to track text instances, in which semantic features can improve the robustness against appearance |
关键词 | 文本检测与识别 实例分割 自底向上 自顶向下 语义特征 |
语种 | 中文 |
文献类型 | 学位论文 |
条目标识符 | http://ir.ia.ac.cn/handle/173211/45044 |
专题 | 多模态人工智能系统全国重点实验室_模式分析与学习 |
推荐引用方式 GB/T 7714 | 冯伟. 图像与视频中的文本检测与识别方法研究[D]. 中国科学院自动化研究所. 中国科学院大学,2021. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | ||
图像与视频中的文本检测与识别方法研究.p(18533KB) | 学位论文 | 开放获取 | CC BY-NC-SA |
个性服务 |
推荐该条目 |
保存到收藏夹 |
查看访问统计 |
导出为Endnote文件 |
谷歌学术 |
谷歌学术中相似的文章 |
[冯伟]的文章 |
百度学术 |
百度学术中相似的文章 |
[冯伟]的文章 |
必应学术 |
必应学术中相似的文章 |
[冯伟]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论