Joint Token and Feature Alignment Framework for Text-Based Person Search | |
Li, Shangze1; Lu, Andong1; Huang, Yan3; Li, Chenglong2; Wang, Liang3 | |
发表期刊 | IEEE SIGNAL PROCESSING LETTERS |
ISSN | 1070-9908 |
2022 | |
卷号 | 29页码:2238-2242 |
通讯作者 | Li, Chenglong(lcl1314@foxmail.com) |
摘要 | Text-based person search is a challenging cross-modal retrieval task. Existing works reduce the inter-modality and intra-class gaps by aligning local features extracted from image and text modalities, which easily lead to mismatching problems due to the lack of annotation information. Besides, it is sub-optimal to reduce two gaps simultaneously in the same feature space. This work proposes a novel joint token and feature alignment framework to reduce the inter-modality and intra-class gaps progressively. Specifically, we first build a dual-path feature learning network to extract features and conduct feature alignment to reduce the inter-modality gap. Second, we design a text generation module to generate token sequences using visual features, and then token alignment is performed to reduce the intra-class gap. Last, a fusion interaction module is introduced to further eliminate the modality heterogeneity using the strategy of multi-stage feature fusion. Extensive experiments on the CUHK-PEDES dataset demonstrate the effectiveness of our model, which significantly outperforms previous state-of-the-art methods. |
关键词 | Feature extraction Visualization Representation learning Logic gates Image reconstruction Transformers Training Cross-modal generation feature alignment text-based person search token alignment transformer |
DOI | 10.1109/LSP.2022.3217682 |
收录类别 | SCI |
语种 | 英语 |
资助项目 | National Natural Science Foundation of China[61976003] ; National Natural Science Foundation of China[62076003] ; Anhui Provincial Key Research and Development Program[202104d07020008] ; Open Project Program of the National Laboratory of Pattern Recognition (NLPR) ; Gaofeng Discipline Construction Project (Computer Science and Technology)[Z010111016] |
项目资助者 | National Natural Science Foundation of China ; Anhui Provincial Key Research and Development Program ; Open Project Program of the National Laboratory of Pattern Recognition (NLPR) ; Gaofeng Discipline Construction Project (Computer Science and Technology) |
WOS研究方向 | Engineering |
WOS类目 | Engineering, Electrical & Electronic |
WOS记录号 | WOS:000880641600004 |
出版者 | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://ir.ia.ac.cn/handle/173211/50679 |
专题 | 智能感知与计算研究中心 |
通讯作者 | Li, Chenglong |
作者单位 | 1.Anhui Univ, Sch Comp Sci & Technol, Anhui Prov Key Lab Multimodal Cognit Computat, Hefei 230601, Peoples R China 2.Anhui Univ, Sch Artificial Intelligence, Informat Mat & Intelligent Sensing Lab Anhui Prov, Anhui Prov Key Lab Multimodal Cognit Computat, Hefei 230601, Peoples R China 3.Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China |
推荐引用方式 GB/T 7714 | Li, Shangze,Lu, Andong,Huang, Yan,et al. Joint Token and Feature Alignment Framework for Text-Based Person Search[J]. IEEE SIGNAL PROCESSING LETTERS,2022,29:2238-2242. |
APA | Li, Shangze,Lu, Andong,Huang, Yan,Li, Chenglong,&Wang, Liang.(2022).Joint Token and Feature Alignment Framework for Text-Based Person Search.IEEE SIGNAL PROCESSING LETTERS,29,2238-2242. |
MLA | Li, Shangze,et al."Joint Token and Feature Alignment Framework for Text-Based Person Search".IEEE SIGNAL PROCESSING LETTERS 29(2022):2238-2242. |
条目包含的文件 | 条目无相关文件。 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论