多域学习及其在检索、聚类和分类中的应用研究

CASIA OpenIR > 毕业生 > 博士学位论文

	多域学习及其在检索、聚类和分类中的应用研究
	梁坚
	2018-12-05
页数	170
学位类型	博士
中文摘要	随着信息化社会的快速发展以及移动互联网等平台的快速普及，全球范围内的数据呈现爆炸式的增长，我们迎来了大数据时代。 IBM的研究人员归纳了大数据的四大主要特点\footnote{\url{https://www.ibmbigdatahub.com/infographic/four-vs-big-data}}，即体量（Volume），高速（Velocity），真实性（Veracity）和多样性（Variety）。以数据的多样性为例，现如今几乎所有的互联网新闻报道都包含了视频、图片、文字描述等多种模态，这些模态某种程度上都可以用来表征这条新闻。此外，“仁者见仁，智者见智”，不同新闻媒体针对同一则新闻，也会有不同的报道视角。像这样描述同一对象的不同表达形式，即为多视角数据。更一般地，当模态、视角之间并不存在一对一的联系时，我们又泛化的称之为多域数据，其中每个域包含有相同或相似类别数据的不同表达形式。针对计算机视觉和多媒体分析领域亟待解决的一些任务，本文提出了几种多域学习算法，并将其应用于跨模态检索、多视角聚类和域自适应学习等主流问题中。本文取得的主要研究成果如下： 1. 有监督跨模态检索试图学习一个优异的异质度量去衡量不同模态特征表达之间的相似度，使得语义标签相似性比较大（小）的异质模态特征间的相似度比较大（小）。对于图像和文本两种常见的模态，本文通过探索模态间共享的锚点，构建主亲和力表达，并从分类的角度充分利用语义标签的监督信息，最终得到了与语义一致的特征表达用以检索。为缓解实值特征在海量数据面前检索效率比较低这一问题，本文随后提出了一种跨模态哈希算法。利用语义信息比较相关的模态学习到的二值编码作为教师去监督另一模态的哈希函数的学习。几个标准数据库上的实验结果验证了这些有监督跨模态检索算法的有效性。 2. 不同于有监督算法，无监督跨模态算法掌握的监督信息仅仅来自于模态间的匹配关系，这就增大了基于语义的跨模态检索的难度。为了解决语义标签缺失的问题，本文提出了一种基于群组不变结构的跨模态学习方法。该方法在典范相关分析（Canonical Correlation Analysis，CCA）的基础上考虑一个附加的群组隐变量，使得投影后的两个模态和这个隐变量同时保持一致。另一方面，由于不同模态数据对在学习隐变量的过程中难易程度有所差别，势必会对学习隐变量带来一定的负面影响。本文随后提出了一种由易到难的学习策略，同时学习潜在的语义标签和从特征映射到语义空间的回归函数。无监督跨模态检索的实验结果表明，这两种算法能取得优异的检索效果，它们甚至能接近部分有监督算法的结果。 3. 多视角学习方法试图在特征层或得分层融合不同视角的观察信息，学习一个统一的表达或分类器去执行聚类任务。为了清除多视角数据中语义无关和视角间冗余的信息，本文提出了一种基于双层判别性降维的多视角聚类方法。该方法首先利用视角之间的相关性去除一些视角间差异过大且与聚类无关的特征。其次利用费歇尔判别准则，通过第二次降维进一步消除前一层降维后的多视角数据中存在的冗余信息，并学习新的聚类指示变量，将之返回给第一层重新进行降维学习。为了验证降维后的统一表达是否有效，本文进一步分析了该方法在分类学习下的效果。实验结果证明该方法不仅在多视角聚类上取得了不错的效果，同时还可以获得良好的分类表达能力。 4. 域自适应学习试图减少目标域数据标注的高昂成本，转而利用源域的监督信息和无标注的目标域数据，学习到有效的目标域分类器。本文首先设计了一种域无关的聚类目标作为学习域不变投影的准则，事实上，这一准则可以看做是域内聚类和衡量域间差异性的最大均值距离（Maximum Mean Discrepancy， MMD）的整合。随后针对目标域伪标签的不确定性设计了一种更为准确的域间差异性衡量标准，并提出了一种渐进式的自适应学习方法，在学习的过程中逐渐加入伪标签确定性比较大的目标域样本，学习到最终的投影函数。最后，为了解决前面方法引入了高维的MMD矩阵所带来的时间成本，本文基于类均值近邻分类器还提出了一种快速简单的域自适应学习基准。多个标准跨域数据库上的结果证实了这些方法的有效性。
英文摘要	With the rapid development of the information society and the mobile Internet, the data around the world is exploding, we usher in the era of big data. Researchers from IBM summed up four features of big data, i.e., Volume, Velocity, Veracity, and Variety. Regarding the variety, almost all of the Internet news reports today contain videos, pictures, text descriptions and other modalities that can be used to characterize the news to some degree. Besides, "the donkey means one thing, and the driver another", different medias have various reporting perspectives even for the same news. Such different representations of the same object are called multi-view data. More generally, when no one-to-one connections between views exist, they are generalized as multi-domain data, where each domain contains different representations of the same or similar categories. Aiming at some tasks in the fields of computer vision and multimedia analysis, this paper puts forward several multi-domain learning algorithms, and applies them to some popular problems, e.g., cross-modal retrieval, multi-view clustering, and domain adaptation. The main results obtained in this paper are summarized as follows: 1. Supervised cross-modal retrieval attempts to learn a metric to measure the similarity between different modalities, making the similarity between heterogeneous modalities of large (small) semantic similarity large (small). For two modalities of image and text, this paper constructs the non-linear features by exploring the shared anchor points across modalities, and exploits the semantic tags via classification, and finally obtains the semantic-consistent features. One cross-modal hashing method is then proposed to alleviate the efficiency of real-valued features for massive data retrieval. We utilize the semantic information in one modality to obtain corresponding binary codes to supervise the learning of hash function in the other modality. The experimental results verify the effectiveness of these supervised cross-modal retrieval methods. 2. Unlike supervised methods, supervisory information in unsupervised ones only comes from the matching relationship between modalities, which increases the difficulty of semantic-based cross-modal retrieval. To address the missing semantic tags, a group-invariant structure is proposed in this paper. It considers an additional group variable based on canonical correlation analysis (CCA), so that the two modalities and the latent variable are consistent at the same time. On the other hand, due to the differences in the degree of difficulty in learning hidden variables, we propose a learning strategy from `easy' to `difficult', and learn the potential semantic tags and regression functions. The experimental results show that these two unsupervised methods can obtain excellent retrieval results that even approach some supervised methods. 3. Multi-view learning attempts to integrate the information from different views at the feature or score levels, learning a unified feature or classifier to perform a clustering task. To eliminate the semantic-independent features and view redundancy in multi-view data, a coupled discriminative dimensionality reduction is proposed in this paper. It first uses the correlation between different views to remove the noisy features that are considered not related to clustering characteristics. Secondly, the second dimensionality reduction further eliminates the redundant information after the previous layer dimensionality reduction via the fisher discriminant criterion, and learns the new cluster indicator variable, and returns it to the first layer to guide the dimension learning. To verify the validity of the learned unified feature, the classification accuracies of this method are further analyzed. The experimental results show that the method obtains good results in not only multi-view clustering but also classification tasks. 4. Domain adaptation attempts to reduce the high cost of labeling the target domain data, instead using the source domain's supervisory information to learn an effective target domain classifier. In this paper, we first design a domain-independent clustering objective that can be regarded as the integration of the maximum mean discrepancy (MMD) and intra-domain clusterings. Then, to address the uncertainty of target pseudo-labeling, a progressive learning method is put forward where the definite target samples are firstly added. Finally, in order to solve the time cost brought by the high-dimensional MMD matrices, we exploit the nearest class mean classifier and propose a fast and simple domain adaptation learning baseline. Results on multiple standard cross-domain databases confirm the effectiveness of these methods.
关键词	多域学习跨模态检索子空间学习多视角聚类域自适应学习
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/23802
专题	毕业生_博士学位论文
通讯作者	梁坚
推荐引用方式 GB/T 7714	梁坚. 多域学习及其在检索、聚类和分类中的应用研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2018.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
Thesis.pdf（10072KB）	学位论文		限制开放	CC BY-NC-SA