Obj2Seq: Formatting Objects as Sequences with Class Prompt for Visual Tasks

	Obj2Seq: Formatting Objects as Sequences with Class Prompt for Visual Tasks
	Chen, Zhiyang1,2 ; Zhu, Yousong1 ; Li, Zhaowen 1,2; Yang, Fan1,3 ; Li, Wei 4; Wang, Haixin1,2 ; Zhao, Chaoyang1 ; Wu, Liwei 4; Zhao, Rui 4; Wang, Jinqiao1,2,3 ; Tang, Ming1
	2022-11-01
会议名称	Neural Information Processing Systems
会议日期	2022-11-28
会议地点	New Orleans, Louisiana & Online
摘要	Visual tasks vary a lot in their output formats and concerned contents, therefore it is hard to process them with an identical structure. One main obstacle lies in the high-dimensional outputs in object-level visual tasks. In this paper, we propose an object-centric vision framework, Obj2Seq. Obj2Seq takes objects as basic units, and regards most object-level visual tasks as sequence generation problems of objects. Therefore, these visual tasks can be decoupled into two steps. First recognize objects of given categories, and then generate a sequence for each of these objects. The definition of the output sequences varies for different tasks, and the model is supervised by matching these sequences with ground-truth targets. Obj2Seq is able to flexibly determine input categories to satisfy customized requirements, and be easily extended to different visual tasks. When experimenting on MS COCO, Obj2Seq achieves 45.7% AP on object detection, 89.0% AP on multi-label classification and 65.0% AP on human pose estimation. These results demonstrate its potential to be generally applied to different visual tasks.
关键词	transformer general visual framework sequence prediction multi-task
收录类别	EI
七大方向——子方向分类	图像视频处理与分析
国重实验室规划方向分类	视觉信息处理
是否有论文关联数据集需要存交	否
文献类型	会议论文
条目标识符	http://ir.ia.ac.cn/handle/173211/56593
专题	紫东太初大模型研究中心_大模型计算
作者单位	1.National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences 2.School of Artificial Intelligence, University of Chinese Academy of Sciences 3.Peng Cheng Laboratory 4.SenseTime Research
第一作者单位	模式识别国家重点实验室
推荐引用方式 GB/T 7714	Chen, Zhiyang,Zhu, Yousong,Li, Zhaowen,et al. Obj2Seq: Formatting Objects as Sequences with Class Prompt for Visual Tasks[C],2022.

条目包含的文件		下载所有文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
1533_obj2seq_formatt（1289KB）	会议论文		开放获取	CC BY-NC-SA	浏览下载