Knowledge Commons of Institute of Automation,CAS
Dual Transformer With Multi-Grained Assembly for Fine-Grained Visual Classification | |
Ji, Ruyi1,2![]() ![]() | |
发表期刊 | IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY
![]() |
ISSN | 1051-8215 |
2023-09-01 | |
卷号 | 33期号:9页码:5009-5021 |
摘要 | Fine-grained visual classification requires distinguishing sub-categories within the same super-category, which suffers from small inter-class and large intra-class variances. This paper aims to improve the FGVC task towards better performance, for which we deliver a novel dual Transformer framework (coined Dual-TR) with multi-grained assembly. The Dual-TR is well-designed to encode fine-grained objects by two parallel hierarchies, which is amenable to capturing the subtle yet discriminative cues via the self-attention mechanism in ViT. Specifically, we perform orthogonal multi-grained assembly within the Transformer structure for a more robust representation, i.e., intra-layer and inter-layer assembly. The former aims to explore the informative feature in various self-attention heads within the Transformer layer. The latter pays attention to the token assembly across Transformer layers. Meanwhile, we introduce the constraint of center loss to pull intra-class samples' compactness and push that of inter-class samples. Extensive experiments show that Dual-TR performs on par with the state-of-the-art methods on four public benchmarks, including CUB-200-2011, NABirds, iNaturalist2017, and Stanford Dogs. The comprehensive ablation studies further demonstrate the effectiveness of architectural design choices. |
关键词 | Transformer multi-grained assembly fine-grained visual classification |
DOI | 10.1109/TCSVT.2023.3248791 |
收录类别 | SCI |
语种 | 英语 |
资助项目 | Key Research Program of Frontier Sciences, CAS[ZDBSLY-JSC038] ; CAAI-Huawei MindSpore Open Fund and Youth Innovation Promotion Association, CAS[2020111] |
项目资助者 | Key Research Program of Frontier Sciences, CAS ; CAAI-Huawei MindSpore Open Fund and Youth Innovation Promotion Association, CAS |
WOS研究方向 | Engineering |
WOS类目 | Engineering, Electrical & Electronic |
WOS记录号 | WOS:001063316800042 |
出版者 | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC |
七大方向——子方向分类 | 目标检测、跟踪与识别 |
国重实验室规划方向分类 | 多尺度信息处理 |
是否有论文关联数据集需要存交 | 否 |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://ir.ia.ac.cn/handle/173211/53132 |
专题 | 紫东太初大模型研究中心 |
通讯作者 | Zhang, Libo |
作者单位 | 1.Chinese Acad Sci, State Key Lab Comp Sci, Inst Software, Beijing 100190, Peoples R China 2.Univ Chinese Acad Sci, Sch Comp Sci & Technol, Beijing 101400, Peoples R China 3.Beijing Informat Sci & Technol Univ, Sch Comp Sci, Beijing 100192, Peoples R China 4.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 101400, Peoples R China 5.Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China |
推荐引用方式 GB/T 7714 | Ji, Ruyi,Li, Jiaying,Zhang, Libo,et al. Dual Transformer With Multi-Grained Assembly for Fine-Grained Visual Classification[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,2023,33(9):5009-5021. |
APA | Ji, Ruyi,Li, Jiaying,Zhang, Libo,Liu, Jing,&Wu, Yanjun.(2023).Dual Transformer With Multi-Grained Assembly for Fine-Grained Visual Classification.IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,33(9),5009-5021. |
MLA | Ji, Ruyi,et al."Dual Transformer With Multi-Grained Assembly for Fine-Grained Visual Classification".IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 33.9(2023):5009-5021. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | ||
Dual_Transformer_Wit(4636KB) | 期刊论文 | 作者接受稿 | 开放获取 | CC BY-NC-SA | 浏览 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论