CASIA OpenIR  > 模式识别国家重点实验室  > 先进数据分析与学习
Image-Text Cross-Modal Retrieval via Modality-Specific Feature Learning
Jian Wang; Yonghao He; Cuicui Kang; Shiming Xiang; Chunhong Pan
Conference NameACM International Conference on Multimedia Retrieval
Conference Date2015-6
Conference PlaceShanghai, China
AbstractCross-modal retrieval extends the ability of search engines to deal with the massive cross-modal data. The goal of image-text cross-modal retrieval is to search images (texts) by using text (image) queries by computing the similarities of images and texts directly. Many existing methods rely on low-level visual features and textual features for cross-modal retrieval, ignoring the characteristics existing in the raw data of different modalities. In this paper, a novel model based on modality-specific feature learning is proposed. Considering the characteristics of different modalities, the model uses two types of convolutional neural networks to map the raw data to the latent space representations for images and texts, respectively. Particularly, the convolution based network used for texts involves word embedding learning, which has been proved effective to extract meaningful textual features for text classification. In the latent space, the mapped features of images and texts form relevant and irrelevant image-text pairs, which are used by the one-vs-more learning scheme. This learning scheme can achieve ranking functionality by allowing for one relevant and more irrelevant pairs. The standard back-propagation technique is employed to update the parameters of two convolutional networks. Extensive cross-modal retrieval experiments are carried out on three challenging datasets that consist of image-document pairs or image-query click-through data from a search engine, and the results firmly demonstrate that the proposed model is much more effective.
Document Type会议论文
AffiliationNational Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences
Recommended Citation
GB/T 7714
Jian Wang,Yonghao He,Cuicui Kang,et al. Image-Text Cross-Modal Retrieval via Modality-Specific Feature Learning[C],2015.
Files in This Item: Download All
File Name/Size DocType Version Access License
p347-wang.pdf(1411KB)会议论文 开放获取CC BY-NC-SAView Download
Related Services
Recommend this item
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Jian Wang]'s Articles
[Yonghao He]'s Articles
[Cuicui Kang]'s Articles
Baidu academic
Similar articles in Baidu academic
[Jian Wang]'s Articles
[Yonghao He]'s Articles
[Cuicui Kang]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Jian Wang]'s Articles
[Yonghao He]'s Articles
[Cuicui Kang]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: p347-wang.pdf
Format: Adobe PDF
All comments (0)
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.