CASIA OpenIR  > 紫东太初大模型研究中心
Dual Transformer With Multi-Grained Assembly for Fine-Grained Visual Classification
Ji, Ruyi1,2; Li, Jiaying3; Zhang, Libo1; Liu, Jing4,5; Wu, Yanjun1
发表期刊IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY
ISSN1051-8215
2023-09-01
卷号33期号:9页码:5009-5021
摘要

Fine-grained visual classification requires distinguishing sub-categories within the same super-category, which suffers from small inter-class and large intra-class variances. This paper aims to improve the FGVC task towards better performance, for which we deliver a novel dual Transformer framework (coined Dual-TR) with multi-grained assembly. The Dual-TR is well-designed to encode fine-grained objects by two parallel hierarchies, which is amenable to capturing the subtle yet discriminative cues via the self-attention mechanism in ViT. Specifically, we perform orthogonal multi-grained assembly within the Transformer structure for a more robust representation, i.e., intra-layer and inter-layer assembly. The former aims to explore the informative feature in various self-attention heads within the Transformer layer. The latter pays attention to the token assembly across Transformer layers. Meanwhile, we introduce the constraint of center loss to pull intra-class samples' compactness and push that of inter-class samples. Extensive experiments show that Dual-TR performs on par with the state-of-the-art methods on four public benchmarks, including CUB-200-2011, NABirds, iNaturalist2017, and Stanford Dogs. The comprehensive ablation studies further demonstrate the effectiveness of architectural design choices.

关键词Transformer multi-grained assembly fine-grained visual classification
DOI10.1109/TCSVT.2023.3248791
收录类别SCI
语种英语
资助项目Key Research Program of Frontier Sciences, CAS[ZDBSLY-JSC038] ; CAAI-Huawei MindSpore Open Fund and Youth Innovation Promotion Association, CAS[2020111]
项目资助者Key Research Program of Frontier Sciences, CAS ; CAAI-Huawei MindSpore Open Fund and Youth Innovation Promotion Association, CAS
WOS研究方向Engineering
WOS类目Engineering, Electrical & Electronic
WOS记录号WOS:001063316800042
出版者IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
七大方向——子方向分类目标检测、跟踪与识别
国重实验室规划方向分类多尺度信息处理
是否有论文关联数据集需要存交
引用统计
被引频次:1[WOS]   [WOS记录]     [WOS相关记录]
文献类型期刊论文
条目标识符http://ir.ia.ac.cn/handle/173211/53132
专题紫东太初大模型研究中心
通讯作者Zhang, Libo
作者单位1.Chinese Acad Sci, State Key Lab Comp Sci, Inst Software, Beijing 100190, Peoples R China
2.Univ Chinese Acad Sci, Sch Comp Sci & Technol, Beijing 101400, Peoples R China
3.Beijing Informat Sci & Technol Univ, Sch Comp Sci, Beijing 100192, Peoples R China
4.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 101400, Peoples R China
5.Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
推荐引用方式
GB/T 7714
Ji, Ruyi,Li, Jiaying,Zhang, Libo,et al. Dual Transformer With Multi-Grained Assembly for Fine-Grained Visual Classification[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,2023,33(9):5009-5021.
APA Ji, Ruyi,Li, Jiaying,Zhang, Libo,Liu, Jing,&Wu, Yanjun.(2023).Dual Transformer With Multi-Grained Assembly for Fine-Grained Visual Classification.IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,33(9),5009-5021.
MLA Ji, Ruyi,et al."Dual Transformer With Multi-Grained Assembly for Fine-Grained Visual Classification".IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 33.9(2023):5009-5021.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
Dual_Transformer_Wit(4636KB)期刊论文作者接受稿开放获取CC BY-NC-SA浏览
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Ji, Ruyi]的文章
[Li, Jiaying]的文章
[Zhang, Libo]的文章
百度学术
百度学术中相似的文章
[Ji, Ruyi]的文章
[Li, Jiaying]的文章
[Zhang, Libo]的文章
必应学术
必应学术中相似的文章
[Ji, Ruyi]的文章
[Li, Jiaying]的文章
[Zhang, Libo]的文章
相关权益政策
暂无数据
收藏/分享
文件名: Dual_Transformer_With_Multi-Grained_Assembly_for_Fine-Grained_Visual_Classification.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。