多视角学习算法及其应用研究

CASIA OpenIR > 毕业生 > 博士学位论文

	多视角学习算法及其应用研究
	尹奇跃
	2017-05
学位类型	工学博士
中文摘要	自大数据时代到来之后，其对各行各业产生了深远的影响。同时，大数据下数据的表现形式也越来越多样，比如描述一个网页，可以由网页上的图片、文本和超级链接进行表征。又例如，描述一张图片，可以通过各种不同的视觉描述子来进行表达。这种描述同一实体的不同特征表达，称之为多视角数据。不同视角数据可以是异质的多模态数据，如网页中的文本和图像，也可以是同质的不同特征描述子，如图像的SIFT和GIST特征。因为不同的视角描述语义相同的实体但同时存在视角差异，这使得多视角数据之间存在互补与一致两个基本特性，这也是多视角学习的基础。针对当下多视角学习中亟待解决的一些任务，本文提出了几种多视角数据分析方法，并将其应用于多视角聚类和跨视角检索任务中。本文研究的主要内容如下： 1、基于谱聚类的子空间分割方法可以得到数据的结构化表达，具有重要的应用价值，但是并不能处理多视角数据。为了挖掘多视角数据结构化表达之间的关系，提出了一种基于结构稀疏的多视角学习方法。该方法首先基于稀疏自表达模型学习多视角数据的稀疏表达，同时通过不同程度的结构约束方式来建模不同视角之间的相关关系。为进一步提升模型性能，将数据之间的一些先验信息加入到模型中去以辅助多视角数据结构化表达的学习。多视角聚类实验结果验证了该方法的有效性。 2、多视角统一子空间表达与视角依赖子空间表达的关系一直是多视角数据一致与互补特性挖掘的难点，为了缓解这一问题，提出了一种基于知识图谱的多视角学习方法。该方法将数据视角间和视角内的相似关系类比为知识图谱，并通过知识图谱的建模方式辅助进行多视角数据高层语义表达的学习。在建模多视角统一子空间表达与视角依赖子空间表达的关系时，张量操作矩阵被引入以充分挖掘视角间的互补与一致特性。多视角聚类实验结果证明该方法可以学到更好的统一子空间表达。 3、异质视角之间往往存在较大的语义鸿沟，使得多视角数据关系挖掘变得困难，为了减少语义鸿沟，提出了一种基于深度自动编码网络的多视角学习方法。该方法通过叠加若干限制玻尔兹曼机网络以进行不同视角（图像和文本）的较高层语义抽取，在减小异质视角语义鸿沟的同时，提出使用自编码网络进行不同视角的编码以进行多视角数据关系的挖掘。这里为了量化互补与一致两个基本特性，将不同视角的编码层切分为两个部分，使其分别对应共享的信息以及视角独有的信息。多视角聚类和跨视角检索任务证明了该模型的有效性。 4、先验信息作为多视角数据的已知高层语义信息，可以在一定程度上指导多视角学习的过程，为了探索先验信息对多视角学习的影响，提出了一种基于结构约束的半监督多视角学习方法。该方法在优化多视角数据语义类别矩阵的同时直接借助于先验信息提供的语义标注进行语义类别矩阵的指导学习。模型可以处理部分观测语义类别和观测链接两种先验知识。考虑到不同视角特征对不同语义类别作用的差别以及视角内不同特征判别性的差异，提出了视角选择与视角内特征选择策略。半监督多视角聚类实验结果验证了该方法的有效性。 5、现实多视角数据往往呈现出视角缺失的问题，造成传统多视角学习性能的退化，为了缓解这一问题，提出了一种基于回归模型的不完整多视角学习方法。该方法基于数据回归的方式进行语义类别矩阵的优化，同时，借助于该语义空间，建模了不完整多视角数据之间的相关关系。为处理高维且具有噪声的多视角数据，结构化特征选择策略被提出并加以利用，除此之外，在语义空间学习的同时，视角间以及视角内部的相似性关系得以保持以增强模型的学习能力。多视角聚类和跨视角检索任务证明了学习到的语义空间的有效性。
英文摘要	Since 2012, the big data era comes, which has a galvanising impact on almost all the fields. Apart from the increase of data volume, various types of representations are available for semantically same data. For example, a webpage can be described by images, texts and hyperlinks, and an image can be encoded by different kinds of visual descriptors. We call such data multi-view data, with each view corresponding to a type of feature set. Generally, different views can be of heterogeneous modalities, such as image and text, and also homogeneous feature descriptors, such as SIFT and GIST. Since those multi-view data represent semantically same data but with differences, complementarity and consistency become two basic characteristics, which is also the base of multi-view learning. Based on previous study and the problems to be solved, several multi-view learning methods are proposed and applied to multi-view clustering and cross-view retrieval tasks. 1. Spectral based subspace segmentation methods usually capture structure information of data, which is important in data representation. Unfortunately, those algorithms cannot be applied to multi-view data. In order to explore the relation between multi-view structure representations, a novel multi-view structure constrained sparse learning method is developed. Firstly, sparse representations of multi-view data are obtained by sparse self-representation. Meanwhile, several structure constrains are utilized to model their relation with different strength. Furthermore, to promote learning performance, prior knowledge is used to enhance the multi-view sparse representation learning. Multi-view clustering results on five public datasets show the effectiveness of the proposed algorithm. 2. For multi-view data, learning the connection between unified embedding and view dependent embedding has always been a tricky problem. To alleviate this, a novel knowledge graph based multi-view learning method is proposed. By comparing multi-view data with knowledge graph, a view can be regarded as a type of relation in a knowledge graph. Accordingly, embedding methods for knowledge graph offer a lesson for multi-view learning. To model the relation between unified embedding and view dependent embedding, tensor operator is introduced to explore the complementarity and consistency. Extensive multi-view clustering experiments validate the proposed method. 3. The main difficulty of heterogeneous multi-view learning is the large semantic gap, which makes learning complementarity and consistency a challenging problem. To reduce such semantic gap, a deep autoencoder based multi-view learning method is proposed for two specific views, i.e., image and text. Several stacked restricted boltzmann machines are utilized for higher level semantics extraction, which reduce the semantic gap to some extent. Then, based on two distinct autoencoder networks, multi-view characteristics can be excavated through their coding layers. For quantizing such characteristics, the code layer is divided into two parts with one as shared information and the other as specific information. Multi-view clustering and cross-view retrieval experiments demonstrate the proposed model performs better than typical competing methods. 4. Prior knowledge, serving as extra high level semantic information, provides a natural guidance for multi-view learning. To explore such prior, a structure constrained semi-supervised multi-view learning method is proposed. The model directly optimizes cluster indicator matrix, which is an intuitive reflection of data semantics. Besides, the prior representing partially observed true semantics can be directly utilized guiding the learned data semantics. Furthermore, feature learning is performed to simultaneously select views for each semantic label and discriminative features in each view for all the labels. Extensive semi-supervised multi-view clustering results validate the proposed algorithm. 5. In real world, multi-view data are often incomplete, namely some examples have incomplete feature sets, which leads to performance degeneration of traditional complete multi-view learning methods. To alleviate this problem, a novel unsupervised regression based incomplete multi-view learning method is developed. The proposed method learns cluster indicator matrix through several linear projection matrices, which establishes a bridge for incomplete feature sets. Besides, feature learning is considered to deal with high dimensional and noisy features. Furthermore, to enhance the learned representations, the inter-view and intra-view data similarities are preserved through a multi-view graph regularization. Extensive experiments on multi-view clustering and cross-view retrieval show the advantages of the proposed method.
关键词	多视角学习多视角聚类跨视角检索
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/14748
专题	毕业生_博士学位论文
作者单位	中国科学院自动化研究所
推荐引用方式 GB/T 7714	尹奇跃. 多视角学习算法及其应用研究[D]. 北京. 中国科学院研究生院,2017.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
多视角学习算法及其应用研究.pdf（7748KB）	学位论文		限制开放	CC BY-NC-SA