DPT: Deformable Patch-based Transformer for Visual Recognition

	DPT: Deformable Patch-based Transformer for Visual Recognition
	Chen,Zhiyang1,2 ; Zhu, Yousong1 ; Zhao,Chaoyang1 ; Hu, Guosheng 3; Zeng, Wei 4; Wang, Jinqiao1,2 ; Tang, Ming1
	2021-10
会议名称	ACM International Conference on Multimedia
会议日期	2021-10-20
会议地点	Chengdu, China
摘要	Transformer has achieved great success in computer vision, while how to split patches in an image remains a problem. Existing methods usually use a fixed-size patch embedding which might destroy the semantics of objects. To address this problem, we propose a new Deformable Patch (DePatch) module which learns to adaptively split the images into patches with different positions and scales in a data-driven way rather than using predefined fixed patches. In this way, our method can well preserve the semantics in patches. The DePatch module can work as a plug-and-play module, which can easily be incorporated into different transformers to achieve an end-to-end training. We term this DePatch-embedded transformer as Deformable Patch-based Transformer (DPT) and conduct extensive evaluations of DPT on image classification and object detection. Results show DPT can achieve 81.9% top-1 accuracy on ImageNet classification, and 43.7% box mAP with RetinaNet, 44.3% with Mask R-CNN on MSCOCO object detection. Code has been made available at: https://github.com/CASIA-IVA-Lab/DPT.
收录类别	EI
七大方向——子方向分类	图像视频处理与分析
文献类型	会议论文
条目标识符	http://ir.ia.ac.cn/handle/173211/47414
专题	紫东太初大模型研究中心_图像与视频分析
通讯作者	Zhao,Chaoyang
作者单位	1.National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China 2.School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China 3.AnyVision, Belfast, UK 4.Peking University, Beijing, China
第一作者单位	模式识别国家重点实验室
通讯作者单位	模式识别国家重点实验室
推荐引用方式 GB/T 7714	Chen,Zhiyang,Zhu, Yousong,Zhao,Chaoyang,et al. DPT: Deformable Patch-based Transformer for Visual Recognition[C],2021.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
DPT Deformable Patch（3799KB）	会议论文		开放获取	CC BY-NC-SA	浏览