Knowledge Commons of Institute of Automation,CAS
Obj2Seq: Formatting Objects as Sequences with Class Prompt for Visual Tasks | |
Chen, Zhiyang1,2![]() ![]() ![]() ![]() ![]() ![]() ![]() | |
2022-11-01 | |
会议名称 | Neural Information Processing Systems |
会议日期 | 2022-11-28 |
会议地点 | New Orleans, Louisiana & Online |
摘要 | Visual tasks vary a lot in their output formats and concerned contents, therefore it is hard to process them with an identical structure. One main obstacle lies in the high-dimensional outputs in object-level visual tasks. In this paper, we propose an object-centric vision framework, Obj2Seq. Obj2Seq takes objects as basic units, and regards most object-level visual tasks as sequence generation problems of objects. Therefore, these visual tasks can be decoupled into two steps. First recognize objects of given categories, and then generate a sequence for each of these objects. The definition of the output sequences varies for different tasks, and the model is supervised by matching these sequences with ground-truth targets. Obj2Seq is able to flexibly determine input categories to satisfy customized requirements, and be easily extended to different visual tasks. When experimenting on MS COCO, Obj2Seq achieves 45.7% AP on object detection, 89.0% AP on multi-label classification and 65.0% AP on human pose estimation. These results demonstrate its potential to be generally applied to different visual tasks. |
关键词 | transformer general visual framework sequence prediction multi-task |
收录类别 | EI |
七大方向——子方向分类 | 图像视频处理与分析 |
国重实验室规划方向分类 | 视觉信息处理 |
是否有论文关联数据集需要存交 | 否 |
文献类型 | 会议论文 |
条目标识符 | http://ir.ia.ac.cn/handle/173211/56593 |
专题 | 紫东太初大模型研究中心_大模型计算 |
作者单位 | 1.National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences 2.School of Artificial Intelligence, University of Chinese Academy of Sciences 3.Peng Cheng Laboratory 4.SenseTime Research |
第一作者单位 | 模式识别国家重点实验室 |
推荐引用方式 GB/T 7714 | Chen, Zhiyang,Zhu, Yousong,Li, Zhaowen,et al. Obj2Seq: Formatting Objects as Sequences with Class Prompt for Visual Tasks[C],2022. |
条目包含的文件 | 下载所有文件 | |||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | ||
1533_obj2seq_formatt(1289KB) | 会议论文 | 开放获取 | CC BY-NC-SA | 浏览 下载 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论