Cycle-Consistent Weakly Supervised Visual Grounding With Individual and Contextual Representations
Zhang, Ruisong1,2; Wang, Chuang1,2; Liu, Cheng-Lin1,2
发表期刊IEEE TRANSACTIONS ON IMAGE PROCESSING
ISSN1057-7149
2023
卷号32页码:5167-5180
通讯作者Zhang, Ruisong(zhangruisong2019@ia.ac.cn)
摘要Visual grounding, aiming to align image regions with textual queries, is a fundamental task for cross-modal learning. We study the weakly supervised visual grounding, where only image-text pairs at a coarse-grained level are available. Due to the lack of fine-grained correspondence information, existing approaches often encounter matching ambiguity. To overcome this challenge, we introduce the cycle consistency constraint into region-phrase pairs, which strengthens correlated pairs and weakens unrelated pairs. This cycle pairing makes use of the bidirectional association between image regions and text phrases to alleviate matching ambiguity. Furthermore, we propose a parallel grounding framework, where backbone networks and subsequent relation modules extract individual and contextual representations to calculate context-free and context-aware similarities between regions and phrases separately. Those two representations characterize visual/linguistic individual concepts and inter-relationships, respectively, and then complement each other to achieve cross-modal alignment. The whole framework is trained by minimizing an image-text contrastive loss and a cycle consistency loss. During inference, the above two similarities are fused to give the final region-phrase matching score. Experiments on five popular datasets about visual grounding demonstrate a noticeable improvement in our method. The source code is available at https://github.com/Evergrow/WSVG.
关键词Visualization Grounding Task analysis Sports equipment Image reconstruction Transformers Training Weakly supervised learning visual grounding cycle consistency individual and contextual representations
DOI10.1109/TIP.2023.3311917
关键词[WOS]LANGUAGE
收录类别SCI
语种英语
资助项目National Key Research and Development Program ; National Natural Science Foundation of China (NSFC)[2018AAA0100400] ; National Natural Science Foundation of China (NSFC)[U20A20223] ; Pioneer Hundred Talents Program of the Chinese Academy of Sciences (CAS)[61721004] ; [Y9S9MS08]
项目资助者National Key Research and Development Program ; National Natural Science Foundation of China (NSFC) ; Pioneer Hundred Talents Program of the Chinese Academy of Sciences (CAS)
WOS研究方向Computer Science ; Engineering
WOS类目Computer Science, Artificial Intelligence ; Engineering, Electrical & Electronic
WOS记录号WOS:001070756500003
出版者IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
引用统计
被引频次:1[WOS]   [WOS记录]     [WOS相关记录]
文献类型期刊论文
条目标识符http://ir.ia.ac.cn/handle/173211/53033
专题多模态人工智能系统全国重点实验室
通讯作者Zhang, Ruisong
作者单位1.Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence S, Beijing 100190, Peoples R China
2.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
第一作者单位中国科学院自动化研究所
通讯作者单位中国科学院自动化研究所
推荐引用方式
GB/T 7714
Zhang, Ruisong,Wang, Chuang,Liu, Cheng-Lin. Cycle-Consistent Weakly Supervised Visual Grounding With Individual and Contextual Representations[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING,2023,32:5167-5180.
APA Zhang, Ruisong,Wang, Chuang,&Liu, Cheng-Lin.(2023).Cycle-Consistent Weakly Supervised Visual Grounding With Individual and Contextual Representations.IEEE TRANSACTIONS ON IMAGE PROCESSING,32,5167-5180.
MLA Zhang, Ruisong,et al."Cycle-Consistent Weakly Supervised Visual Grounding With Individual and Contextual Representations".IEEE TRANSACTIONS ON IMAGE PROCESSING 32(2023):5167-5180.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Zhang, Ruisong]的文章
[Wang, Chuang]的文章
[Liu, Cheng-Lin]的文章
百度学术
百度学术中相似的文章
[Zhang, Ruisong]的文章
[Wang, Chuang]的文章
[Liu, Cheng-Lin]的文章
必应学术
必应学术中相似的文章
[Zhang, Ruisong]的文章
[Wang, Chuang]的文章
[Liu, Cheng-Lin]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。