With the development of Web2.0, social websites such as Weibo, Flickr and Youtube have gradually become a novel platform for data generation and knowledge sharing. Over the last decade there has been a massive explosion of multimedia content, such as text, image, video, etc. These different types of data are commonly used to describe the same semantic. For example, a Wikipedia article generally includes text and image. These multimodal data are similar at semantics, and each modality of data are complementary to other modalities of data. With the rapid growth of the multimodal data, it is desirable to perform "cross-modal data analysis" on the multimodal data for effective management and usage. In this paper, several methods are proposed for cross-modal data analysis, which are applied to the cross-modal retrieval task. 1. The main difficulty of the cross-modal retrieval is how to measure the content similarity between different modalities of data. To address this problem, a joint graph regularized multi-modal subspace learning (JGRMSL) algorithm is proposed, which integrates inter-modality similarities and intra-modality similarities into a joint graph regularization to better explore the cross-modal correlation and the local manifold structure in each modality of data. To obtain good class separation, the idea of Linear Discriminant Analysis (LDA) is incorporated into the proposed method by maximizing the between-class covariance of all projected data and minimizing the within-class covariance of all projected data. Experimental results on two public cross-modal datasets demonstrate the effectiveness of our algorithm. 2. Since the dimensionality of real world data is often high, there are naturally many redundant and irrelevant features. Hence, how to simultaneously select the relevant and discriminative features for different modalities of data is very important. Accordingly, a coupled spaces learning method is proposed to jointly perform common subspace learning and coupled feature selection. This method aims to learn two projection matrices to project multimodal data into a common feature space, in which cross-modal data matching can be performed. And in the learning procedure, the $\ell_{21}$-norm penalties are imposed on the two projection matrices separately, which leads to select relevant and discriminative features from coupled feature spaces simultaneously. A trace norm is further imposed on the projected data as a low-rank constraint, whi...
修改评论