Text-to-Image Vehicle Re-Identification: Multi-Scale Multi-View Cross-Modal Alignment Network and a Unified Benchmark
Ding, Leqi1,2; Liu, Lei1,2; Huang, Yan3; Li, Chenglong1,2; Zhang, Cheng4; Wang, Wei5; Wang, Liang3
发表期刊IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS
ISSN1524-9050
2024-01-16
页码14
通讯作者Li, Chenglong(lcl1314@foxmail.com)
摘要Vehicle Re-IDentification (Re-ID) aims to retrieve the most similar images with a given query vehicle image from a set of images captured by non-overlapping cameras, and plays a crucial role in intelligent transportation systems and has made impressive advancements in recent years. In real-world scenarios, we can often acquire the text descriptions of target vehicle through witness accounts, and then manually search the image queries for vehicle Re-ID, which is time-consuming and labor-intensive. To solve this problem, this paper introduces a new fine-grained cross-modal retrieval task called text-to-image vehicle re-identification, which seeks to retrieve target vehicle images based on the given text descriptions. To bridge the significant gap between language and visual modalities, we propose a novel Multi-scale multi-view Cross-modal Alignment Network (MCANet). In particular, we incorporate view masks and multi-scale features to align image and text features in a progressive way. In addition, we design the Masked Bidirectional InfoNCE (MB-InfoNCE) loss to enhance the training stability and make the best use of negative samples. To provide an evaluation platform for text-to-image vehicle re-identification, we create a Text-to-Image Vehicle Re-Identification dataset (T2I VeRi), which contains 2465 image-text pairs from 776 vehicles with an average sentence length of 26.8 words. Extensive experiments conducted on T2I VeRi demonstrate MCANet outperforms the current state-of-art (SOTA) method by 2.2% in rank-1 accuracy.
关键词Task analysis Feature extraction Visualization Training Electronic mail Benchmark testing Trajectory Text-to-image vehicle re-identification cross-modal alignment multi-scale multi-view analysis benchmark dataset
DOI10.1109/TITS.2023.3348599
收录类别SCI
语种英语
资助项目National Natural Science Foundation of China
项目资助者National Natural Science Foundation of China
WOS研究方向Engineering ; Transportation
WOS类目Engineering, Civil ; Engineering, Electrical & Electronic ; Transportation Science & Technology
WOS记录号WOS:001167345700001
出版者IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
引用统计
文献类型期刊论文
条目标识符http://ir.ia.ac.cn/handle/173211/55626
专题多模态人工智能系统全国重点实验室
通讯作者Li, Chenglong
作者单位1.Anhui Univ, Informat Mat & Intelligent Sensing Lab Anhui Prov, Anhui Prov Key Lab Multimodal Cognit Computat, Hefei 230601, Peoples R China
2.Anhui Univ, Sch Comp Sci & Technol, Hefei 230601, Peoples R China
3.Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
4.Anhui Univ, Stony Brook Inst, Anhui Prov Key Lab Multimodal Cognit Computat, Hefei 230601, Peoples R China
5.Video Invest Detachment Hefei Publ Secur Bur, Hefei, Peoples R China
推荐引用方式
GB/T 7714
Ding, Leqi,Liu, Lei,Huang, Yan,et al. Text-to-Image Vehicle Re-Identification: Multi-Scale Multi-View Cross-Modal Alignment Network and a Unified Benchmark[J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS,2024:14.
APA Ding, Leqi.,Liu, Lei.,Huang, Yan.,Li, Chenglong.,Zhang, Cheng.,...&Wang, Liang.(2024).Text-to-Image Vehicle Re-Identification: Multi-Scale Multi-View Cross-Modal Alignment Network and a Unified Benchmark.IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS,14.
MLA Ding, Leqi,et al."Text-to-Image Vehicle Re-Identification: Multi-Scale Multi-View Cross-Modal Alignment Network and a Unified Benchmark".IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS (2024):14.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Ding, Leqi]的文章
[Liu, Lei]的文章
[Huang, Yan]的文章
百度学术
百度学术中相似的文章
[Ding, Leqi]的文章
[Liu, Lei]的文章
[Huang, Yan]的文章
必应学术
必应学术中相似的文章
[Ding, Leqi]的文章
[Liu, Lei]的文章
[Huang, Yan]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。