跨模态数据分析与应用研究

CASIA OpenIR > 毕业生 > 博士学位论文

	跨模态数据分析与应用研究
其他题名	Research on Cross-modal Data Analysis and Its Application
	王开业
	2015-05-21
学位类型	工学博士
中文摘要	随着互联网进入Web2.0时代，以微博、Flickr、Youtube等为代表的网站已逐渐成为新兴的各种数据产生和共享的平台。伴随而来的是文本、图像、视频和音频等信息的迅速膨胀。在网络中，这些不同类型的数据往往会同时出现，用来表达相同的语义。例如维基百科中的特色文章是通过文本和图像共同表达的，这些信息之间存在互补性和相关性。随着不同类型数据的爆炸式增长，如何对这些不同类型的数据进行“跨模态数据分析”以便更加有效地利用这些数据成为亟待解决的问题。针对这一问题，本文提出了几种跨模态数据分析方法，并将其应用于跨模态检索。本文研究的主要内容如下： 1、跨模态检索的难点是如何度量不同模态数据之间的相似度。为了解决这一问题，提出了一种联合图规则化的多模态子空间学习方法。该方法通过一个联合图规则项利用模态间的相似度和模态内的相似度来建模不同模态数据之间的相关性和每个模态数据内的局部邻域结构。为了得到一个更有区分力的子空间，学习多模态子空间的时候，最大化不同类数据之间的协方差矩阵（每一类包含多个模态的数据），最小化同类数据之间的协方差矩阵。实验结果表明了该方法的有效性。 2、由于数据的底层特征一般都存在冗余和不相关的特征，所以如何在不同模态数据上同时进行特征选择（耦合特征选择）是一个非常重要的问题。为了解决这个问题，提出了一种双空间学习方法，该方法同时进行子空间学习和耦合特征选择。该方法对于每个模态的数据学习一个映射矩阵，把不同模态的数据映射到一个共同的空间中，在这个空间中可以进行不同模态数据之间的相似性度量。在学习映射的过程中，通过对映射矩阵进行$\ell_{21}$范数的约束，来对不同模态的数据进行耦合特征选择，选择出那些相关的、具有区分力的特征。同时，对映射后的数据进行低秩约束来进一步加强不同模态数据之间的相关性。为了求解该问题的目标函数，提出了一种基于半二次最小化的迭代求解算法。实验结果表明该方法能取得更好的检索性能。 3、为了在子空间学习的同时，保持不同模态数据之间的相似性关系，提出了一种联合学习方法。该方法的目标函数有三项构成，第一项是耦合线性回归项，目的是学习从不同模态数据的特征空间到共同空间的映射；第二项是$\ell_{21}$范数项，目的是选择出不同模态数据中那些相关的和具有区分力的特征；第三项是多模态图规则项，用于保持不同模态数据之间的两种相似性关系：模态间数据的相似性关系和模态内数据的相似性关系。为了求解这一目标函数，提出了一种迭代求解算法，同时证明了算法的收敛性。三个跨模态数据库上的实验结果表明了该算法的有效性。 4、不同模态的数据从不同的方面反映了事物的高层语义，由于不同模态数据的底层特征一般具有异构的特点，存在异构鸿沟。为了弥补不同模态数据之间的异构鸿沟，提出了一种基于联合字典学习的多模态数据统一表示学习方法。该方法通过联合字典学习，对于每个模态的数据学习一个字典，而对于表示同一语义的不同模态的数据，学习统一的表示来弥补不同模态数据之间的异构鸿沟。为了使模型鲁棒，对...
英文摘要	With the development of Web2.0, social websites such as Weibo, Flickr and Youtube have gradually become a novel platform for data generation and knowledge sharing. Over the last decade there has been a massive explosion of multimedia content, such as text, image, video, etc. These different types of data are commonly used to describe the same semantic. For example, a Wikipedia article generally includes text and image. These multimodal data are similar at semantics, and each modality of data are complementary to other modalities of data. With the rapid growth of the multimodal data, it is desirable to perform "cross-modal data analysis" on the multimodal data for effective management and usage. In this paper, several methods are proposed for cross-modal data analysis, which are applied to the cross-modal retrieval task. 1. The main difficulty of the cross-modal retrieval is how to measure the content similarity between different modalities of data. To address this problem, a joint graph regularized multi-modal subspace learning (JGRMSL) algorithm is proposed, which integrates inter-modality similarities and intra-modality similarities into a joint graph regularization to better explore the cross-modal correlation and the local manifold structure in each modality of data. To obtain good class separation, the idea of Linear Discriminant Analysis (LDA) is incorporated into the proposed method by maximizing the between-class covariance of all projected data and minimizing the within-class covariance of all projected data. Experimental results on two public cross-modal datasets demonstrate the effectiveness of our algorithm. 2. Since the dimensionality of real world data is often high, there are naturally many redundant and irrelevant features. Hence, how to simultaneously select the relevant and discriminative features for different modalities of data is very important. Accordingly, a coupled spaces learning method is proposed to jointly perform common subspace learning and coupled feature selection. This method aims to learn two projection matrices to project multimodal data into a common feature space, in which cross-modal data matching can be performed. And in the learning procedure, the $\ell_{21}$-norm penalties are imposed on the two projection matrices separately, which leads to select relevant and discriminative features from coupled feature spaces simultaneously. A trace norm is further imposed on the projected data as a low-rank constraint, whi...
关键词	跨模态检索子空间学习特征选择多模态数据统一表示跨模态哈希 Cross-modal Retrieval Subspace Learning Feature Selection Unified Representation Learning Cross-modal Hashing
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6672
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	王开业. 跨模态数据分析与应用研究[D]. 中国科学院自动化研究所. 中国科学院大学,2015.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20121801462806（11079KB）			暂不开放	CC BY-NC-SA