CASIA OpenIR  > 智能感知与计算
Locate then Segment: A Strong Pipeline for Referring Image Segmentation
Jing Y(荆雅)1,2; Kong T(孔涛)3; Wang W(王威)1,2; Wang L(王亮)1,2; Li L(李磊)3; Tan TN(谭铁牛)1,2
Conference Name2021 IEEE Conference on Computer Vision and Pattern Recognition
Conference Date2021-6
Conference Placevirtual

Referring image segmentation aims to segment the objects referred by a natural language expression. Previous methods usually focus on designing an implicit and recurrent feature interaction mechanism to fuse the visuallinguistic features to directly generate the final segmentation mask without explicitly modeling the localization information of the referent instances. To tackle these problems, we view this task from another perspective by decoupling it into a "Locate-Then-Segment" (LTS) scheme. Given a language expression, people generally first perform attention to the corresponding target image regions, then generate a
fine segmentation mask about the object based on its context. The LTS first extracts and fuses both visual and textual features to get a cross-modal representation, then applies a cross-model interaction on the visual-textual features to locate the referred object with position prior, and finally generates the segmentation result with a light-weight segmentation network. Our LTS is simple but surprisingly effective. On three popular benchmark datasets, the LTS outperforms all the previous state-of-the-arts methods by a large margin (e.g., +3.2% on RefCOCO+ and +3.4% on RefCOCOg). In addition, our model is more interpretable with explicitly locating the object, which is also proved by visualization experiments. We believe this framework is promising to serve as a strong baseline for referring image segmentation.

Document Type会议论文
Affiliation1.Center for Research on Intelligent Perception and Computing (CRIPAC), National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences (CASIA)
2.School of Artificial Intelligence, University of Chinese Academy of Sciences (UCAS)
3.ByteDance AI Lab
First Author AffilicationChinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
Recommended Citation
GB/T 7714
Jing Y,Kong T,Wang W,et al. Locate then Segment: A Strong Pipeline for Referring Image Segmentation[C],2021.
Files in This Item: Download All
File Name/Size DocType Version Access License
2103.16284.pdf(4191KB)会议论文 开放获取CC BY-NC-SAView Download
Related Services
Recommend this item
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Jing Y(荆雅)]'s Articles
[Kong T(孔涛)]'s Articles
[Wang W(王威)]'s Articles
Baidu academic
Similar articles in Baidu academic
[Jing Y(荆雅)]'s Articles
[Kong T(孔涛)]'s Articles
[Wang W(王威)]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Jing Y(荆雅)]'s Articles
[Kong T(孔涛)]'s Articles
[Wang W(王威)]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: 2103.16284.pdf
Format: Adobe PDF
All comments (0)
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.