A Two-Stream Hybrid Convolution-Transformer Network Architecture for Clothing-Change Person Re-Identification

doi:10.1109/TMM.2023.3331569

CASIA OpenIR > 模式识别实验室

	A Two-Stream Hybrid Convolution-Transformer Network Architecture for Clothing-Change Person Re-Identification
	Wu, Junyi 1; Huang, Yan2 ; Gao, Min 3; Gao, Zhipeng 1; Zhao, Jianqiang 1; Zhang, Huiji 1; Zhang, Anguo 4,5,6,7
发表期刊	IEEE TRANSACTIONS ON MULTIMEDIA
ISSN	1520-9210
	2024
卷号	26 页码:5326-5339
通讯作者	Huang, Yan(yan.huang@cripac.ia.ac.cn) ; Zhang, Anguo(anguo.zhang@hotmail.com)
摘要	Long-term (also called Clothing-Change) person re-identification (CC-reID) aims at confirming the identity of pedestrians captured at diverse locations and/or times. Current CC-reID methods heavily rely on ID features learned by the CNN architecture. However, with limited receptive fields, CNN is hard to effectively explore some unique but discriminative ID features (e.g., hair style, tattoo and accessories) from small body regions. Compared with CNN, Transformer has certain merits in exploring more diverse ID-unique features and retaining more details by the multi-head self-attention design and the removal of down-sampling operation. In this paper, a two-stream hybrid Convolution-Transformer Network (CT-Net) is proposed for CC-reID by combining both CNN and Transformer parallelly in an end-to-end learning scheme. Specifically, CT-Net contains a CNN-based stream (C-Stream) and a Transformer-based stream (T-Stream). Compared with using C-Stream only, T-Stream is used to encourage the C-Stream to explore more detailed ID-unique features when the clothing information is no reliable in CC-reID. Specifically, a Feature Supplement Module (FSM) is proposed to transfer features learned by T-Stream to C-Stream from low-level to high-level for mining more ID-unique feature. In order to further enhance the discriminability and complementary of ID features learned by our CT-Net, we also introduce a hierarchical supervision with bilinear pooling (HSBP). Experimental results demonstrate that CT-Net performs favorably against the state-of-the-art methods over three CC-reID benchmarks. Meanwhile, CT-Net also demonstrates good generalization ability by achieving comparable performance on traditional person re-ID datasets such as Market-1501 and DukeMTMC-reID.
关键词	Clothing-change person re-identification ID-unique feature feature supplement module hierarchical supervision
DOI	10.1109/TMM.2023.3331569
关键词[WOS]	RECOGNITION ; ATTENTION
收录类别	SCI
语种	英语
资助项目	National Natural Science Foundation of China
项目资助者	National Natural Science Foundation of China
WOS研究方向	Computer Science ; Telecommunications
WOS类目	Computer Science, Information Systems ; Computer Science, Software Engineering ; Telecommunications
WOS记录号	WOS:001189435600030
出版者	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
引用统计
文献类型	期刊论文
条目标识符	http://ir.ia.ac.cn/handle/173211/58121
专题	模式识别实验室
通讯作者	Huang, Yan; Zhang, Anguo
作者单位	1.Xiamen Meiya Pico Informat Co Ltd, Xiamen Meiya Informat Secur Res Inst Co Ltd, AI Res Ctr, Xiamen 361008, Peoples R China 2.Chinese Acad Sci, Inst Automat, Ctr Res Intelligent Percept & Comp, Natl Lab Pattern Recognit, Beijing 100045, Peoples R China 3.Fuzhou Univ, Coll Phys & Informat Engn, Fujian Key Lab Intelligent Proc & Wireless Transmi, Fuzhou 350025, Peoples R China 4.Anhui Univ, Sch Artificial Intelligence, Hefei 230039, Peoples R China 5.Minist Educ, Res Ctr Autonomous Unmanned Syst Technol, Hefei 230039, Peoples R China 6.Anhui Prov Engn Res Ctr Unmanned Syst & Intelligen, Hefei 230039, Peoples R China 7.Univ Macau, Inst Microelect, Taipa 999078, Macao, Peoples R China
通讯作者单位	模式识别国家重点实验室
推荐引用方式 GB/T 7714	Wu, Junyi,Huang, Yan,Gao, Min,et al. A Two-Stream Hybrid Convolution-Transformer Network Architecture for Clothing-Change Person Re-Identification[J]. IEEE TRANSACTIONS ON MULTIMEDIA,2024,26:5326-5339.
APA	Wu, Junyi.,Huang, Yan.,Gao, Min.,Gao, Zhipeng.,Zhao, Jianqiang.,...&Zhang, Anguo.(2024).A Two-Stream Hybrid Convolution-Transformer Network Architecture for Clothing-Change Person Re-Identification.IEEE TRANSACTIONS ON MULTIMEDIA,26,5326-5339.
MLA	Wu, Junyi,et al."A Two-Stream Hybrid Convolution-Transformer Network Architecture for Clothing-Change Person Re-Identification".IEEE TRANSACTIONS ON MULTIMEDIA 26(2024):5326-5339.