CASIA OpenIR  > 模式识别实验室
Locate then Segment: A Strong Pipeline for Referring Image Segmentation
Jing Y(荆雅)1,2; Kong T(孔涛)3; Wang W(王威)1,2; Wang L(王亮)1,2; Li L(李磊)3; Tan TN(谭铁牛)1,2
2021-06
会议名称2021 IEEE Conference on Computer Vision and Pattern Recognition
会议日期2021-6
会议地点virtual
摘要

Referring image segmentation aims to segment the objects referred by a natural language expression. Previous methods usually focus on designing an implicit and recurrent feature interaction mechanism to fuse the visuallinguistic features to directly generate the final segmentation mask without explicitly modeling the localization information of the referent instances. To tackle these problems, we view this task from another perspective by decoupling it into a "Locate-Then-Segment" (LTS) scheme. Given a language expression, people generally first perform attention to the corresponding target image regions, then generate a
fine segmentation mask about the object based on its context. The LTS first extracts and fuses both visual and textual features to get a cross-modal representation, then applies a cross-model interaction on the visual-textual features to locate the referred object with position prior, and finally generates the segmentation result with a light-weight segmentation network. Our LTS is simple but surprisingly effective. On three popular benchmark datasets, the LTS outperforms all the previous state-of-the-arts methods by a large margin (e.g., +3.2% on RefCOCO+ and +3.4% on RefCOCOg). In addition, our model is more interpretable with explicitly locating the object, which is also proved by visualization experiments. We believe this framework is promising to serve as a strong baseline for referring image segmentation.
 

七大方向——子方向分类图像视频处理与分析
文献类型会议论文
条目标识符http://ir.ia.ac.cn/handle/173211/44447
专题模式识别实验室
作者单位1.Center for Research on Intelligent Perception and Computing (CRIPAC), National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences (CASIA)
2.School of Artificial Intelligence, University of Chinese Academy of Sciences (UCAS)
3.ByteDance AI Lab
第一作者单位模式识别国家重点实验室
推荐引用方式
GB/T 7714
Jing Y,Kong T,Wang W,et al. Locate then Segment: A Strong Pipeline for Referring Image Segmentation[C],2021.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
2103.16284.pdf(4191KB)会议论文 开放获取CC BY-NC-SA浏览
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Jing Y(荆雅)]的文章
[Kong T(孔涛)]的文章
[Wang W(王威)]的文章
百度学术
百度学术中相似的文章
[Jing Y(荆雅)]的文章
[Kong T(孔涛)]的文章
[Wang W(王威)]的文章
必应学术
必应学术中相似的文章
[Jing Y(荆雅)]的文章
[Kong T(孔涛)]的文章
[Wang W(王威)]的文章
相关权益政策
暂无数据
收藏/分享
文件名: 2103.16284.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。