CASIA OpenIR
Residual Dual Scale Scene Text Spotting by Fusing Bottom-Up and Top-Down Processing
Wei Feng1,2; Fei Yin1,2; Xu-Yao Zhang1,2; Wenhao He3; Cheng-Lin Liu1,2,4
Source PublicationInternational Journal of Computer Vision
2020-10
Volume1Issue:38Pages:1872–1885
Abstract

Existing methods for arbitrary shaped text spotting can be divided into two categories: bottom-up methods detect and recognize local areas of text, and then group them into text lines or words; top-down methods detect text regions of interest, then apply polygon fitting and text recognition to the detected regions. In this paper, we analyze the advantages and disadvantages of these two methods, and propose a novel text spotter by fusing bottom-up and top-down processing. To detect text of arbitrary shapes, we employ a bottom-up detector to describe text with a series of rotated squares, and design a top-down detector to represent the region of interest with a minimum enclosing rotated rectangle. Then the text boundary is determined by fusing the outputs of two detectors. To connect arbitrary shaped text detection and recognition, we propose a differentiable operator named RoISlide, which can extract features for arbitrary text regions from whole image feature maps. Based on the extracted features through RoISlide, a CNN and CTC based text recognizer is introduced to make the framework free from character-level annotations. To improve the robustness against scale variance, we further propose a residual dual scale spotting mechanism, where two spotters work on different feature levels, and the high-level spotter is based on residuals of the low-level spotter. Our method has achieved state-of-the-art performance on four English datasets and one Chinese dataset, including both arbitrary shaped and oriented texts. We also provide abundant ablation experiments to analyze how the key components affect the performance.

KeywordScene text spotting Arbitrary shapes Bottom-up Top-down Residual dual scale
Document Type期刊论文
Identifierhttp://ir.ia.ac.cn/handle/173211/41453
Collection中国科学院自动化研究所
Corresponding AuthorCheng-Lin Liu
Affiliation1.National Laboratory of Pattern Recognition, Institute of Automation of Chinese Academy of Sciences, Beijing 100190, People’s Republic of China
2.School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, People’s Republic of China
3.Tencent Map Big Data Lab, Beijing 100193, People’s Republic of China
4.CAS Center for Excellence of Brain Science and Intelligence Technology, Beijing 100190, People’s Republic of China
First Author AffilicationChinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
Corresponding Author AffilicationChinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
Recommended Citation
GB/T 7714
Wei Feng,Fei Yin,Xu-Yao Zhang,et al. Residual Dual Scale Scene Text Spotting by Fusing Bottom-Up and Top-Down Processing[J]. International Journal of Computer Vision,2020,1(38):1872–1885.
APA Wei Feng,Fei Yin,Xu-Yao Zhang,Wenhao He,&Cheng-Lin Liu.(2020).Residual Dual Scale Scene Text Spotting by Fusing Bottom-Up and Top-Down Processing.International Journal of Computer Vision,1(38),1872–1885.
MLA Wei Feng,et al."Residual Dual Scale Scene Text Spotting by Fusing Bottom-Up and Top-Down Processing".International Journal of Computer Vision 1.38(2020):1872–1885.
Files in This Item: Download All
File Name/Size DocType Version Access License
发表版.pdf(4242KB)期刊论文作者接受稿开放获取CC BY-NC-SAView Download
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Wei Feng]'s Articles
[Fei Yin]'s Articles
[Xu-Yao Zhang]'s Articles
Baidu academic
Similar articles in Baidu academic
[Wei Feng]'s Articles
[Fei Yin]'s Articles
[Xu-Yao Zhang]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Wei Feng]'s Articles
[Fei Yin]'s Articles
[Xu-Yao Zhang]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: 发表版.pdf
Format: Adobe PDF
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.