Tell, Imagine, and Search: End-to-end Learning for Composing Text and Image to Image Retrieval
Zhang, Feifei1,2,3; Xu, Mingliang4; Xu, Changsheng1,5,6
发表期刊ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS
ISSN1551-6857
2022-05-01
卷号18期号:2页码:23
通讯作者Zhang, Feifei(feifeizhang1231@gmail.com)
摘要Composing Text and Image to Image Retrieval (CTI-IR) is an emerging task in computer vision, which allows retrieving images relevant to a query image with text describing desired modifications to the query image. Most conventional cross-modal retrieval approaches usually take one modality data as the query to retrieve relevant data of another modality. Different from the existing methods, in this article, we propose an endto-end trainable network for simultaneous image generation and CTI-IR. The proposed model is based on Generative Adversarial Network (GAN) and enjoys several merits. First, it can learn a generative and discriminative feature for the query (a query image with text description) by jointly training a generative model and a retrieval model. Second, our model can automatically manipulate the visual features of the reference image in terms of the text description by the adversarial learning between the synthesized image and target image. Third, global-local collaborative discriminators and attention-based generators are exploited, allowing our approach to focus on both the global and local differences between the query image and the target image. As a result, the semantic consistency and fine-grained details of the generated images can be better enhanced in our model. The generated image can also be used to interpret and empower our retrieval model. Quantitative and qualitative evaluations on three benchmark datasets demonstrate that the proposed algorithm performs favorably against state-of-the-art methods.
关键词Composing text and image to image retrieval end-to-end image generation generative adversarial network global-local
DOI10.1145/3478642
收录类别SCI
语种英语
资助项目National Key Research and Development Program of China[2018AAA0100604] ; National Natural Science Foundation of China[62036012] ; National Natural Science Foundation of China[61720106006] ; National Natural Science Foundation of China[62002355] ; National Natural Science Foundation of China[61721004] ; National Natural Science Foundation of China[61832002] ; National Natural Science Foundation of China[62072455] ; National Natural Science Foundation of China[U1705262] ; National Natural Science Foundation of China[U1836220] ; Key Research Program of Frontier Sciences of CAS[QYZDJ-SSW-JSC039] ; National Postdoctoral Program for Innovative Talents[BX20190367] ; Beijing Natural Science Foundation[L201001]
项目资助者National Key Research and Development Program of China ; National Natural Science Foundation of China ; Key Research Program of Frontier Sciences of CAS ; National Postdoctoral Program for Innovative Talents ; Beijing Natural Science Foundation
WOS研究方向Computer Science
WOS类目Computer Science, Information Systems ; Computer Science, Software Engineering ; Computer Science, Theory & Methods
WOS记录号WOS:000773689400012
出版者ASSOC COMPUTING MACHINERY
七大方向——子方向分类多模态智能
引用统计
被引频次:7[WOS]   [WOS记录]     [WOS相关记录]
文献类型期刊论文
条目标识符http://ir.ia.ac.cn/handle/173211/48170
专题多模态人工智能系统全国重点实验室_多媒体计算
通讯作者Zhang, Feifei
作者单位1.Chinese Acad Sci, Inst Automat, NLPR, 95 Zhongguancun East Rd, Beijing 100190, Peoples R China
2.Tianjin Univ Technol, Sch Comp Sci & Engn, 391 Bin Shui Xi Dao Rd, Tianjin 300384, Peoples R China
3.Chinese Acad Sci, Natl Lab Pattern Recognit, Inst Automat, 95 Zhongguancun East Rd, Beijing 100190, Peoples R China
4.Zhengzhou Univ, Sch Informat Engn, 100 Sci Ave, Zhengzhou 450001, Henan, Peoples R China
5.Univ Chinese Acad Sci, Sch Artificial Intelligence, 19 Yuquan Rd, Beijing 100049, Peoples R China
6.Peng Cheng Lab, 2 Xingke 1st St, Shenzhen 518000, Peoples R China
第一作者单位模式识别国家重点实验室
通讯作者单位模式识别国家重点实验室
推荐引用方式
GB/T 7714
Zhang, Feifei,Xu, Mingliang,Xu, Changsheng. Tell, Imagine, and Search: End-to-end Learning for Composing Text and Image to Image Retrieval[J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS,2022,18(2):23.
APA Zhang, Feifei,Xu, Mingliang,&Xu, Changsheng.(2022).Tell, Imagine, and Search: End-to-end Learning for Composing Text and Image to Image Retrieval.ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS,18(2),23.
MLA Zhang, Feifei,et al."Tell, Imagine, and Search: End-to-end Learning for Composing Text and Image to Image Retrieval".ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS 18.2(2022):23.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Zhang, Feifei]的文章
[Xu, Mingliang]的文章
[Xu, Changsheng]的文章
百度学术
百度学术中相似的文章
[Zhang, Feifei]的文章
[Xu, Mingliang]的文章
[Xu, Changsheng]的文章
必应学术
必应学术中相似的文章
[Zhang, Feifei]的文章
[Xu, Mingliang]的文章
[Xu, Changsheng]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。