基于子空间学习的多媒体内容分析与理解

CASIA OpenIR > 毕业生 > 博士学位论文

	基于子空间学习的多媒体内容分析与理解
其他题名	Subspace Learning-based Multimedia Content Analysis and Understanding
	李泽超
	2013-05-30
学位类型	工学博士
中文摘要	随着科学技术的飞速发展、数字移动设备的普及以及社会媒体（social media）的兴起，多媒体内容呈现爆炸式增长。多媒体作为新一代信息资源，除传统的文字信息外，还包含了具有表现力强、蕴含信息量大、形象生动等特点的图像、音频和视频等媒体。因此，面对形式和内容如此丰富的海量多媒体信息，如何对其进行有效的分析与理解成为当前的热点研究问题。多媒体数据的特征表示越来越多样化，其特征维数也越来越高。然而，部分特征之间相关性太强，信息冗余量太大，甚至部分特征近似于噪声。也就是说，不是所有的特征都具有判别力。研究如何从中选择出最能反映数据本质的特征表示对多媒体内容分析与理解具有重要的意义。另一方面，多媒体数据一般采用底层特征描述，与高层语义之间存在所谓的“语义鸿沟”。充分挖掘多媒体数据的潜在结构，为其学习一个紧致的数据表示，建立底层特征与高层语义之间的语义映射，将能缩小“语义鸿沟”，有效地改进多媒体内容分析与理解。另外，各种媒体信息在语义表达上通常具有一定的相容互补特性，充分挖掘和利用这种特性能够有效改善多媒体内容分析与理解的性能。针对上述问题，本文以子空间学习为主线在理论方法研究（特征选择和语义映射）和实际应用（个性化社会标签推荐和新闻检索）两个方面对多媒体内容分析与理解进行了研究和探讨。主要研究内容和贡献如下： 1. 基于非负谱聚类和潜在结构学习的无监督特征选择。针对目前数据特征维数高、存在噪声特征以及特征之间信息冗余的问题，本文提出了一种融合非负谱聚类分析和潜在结构分析的无监督特征选择方法。在特征选择的过程中，提出了非负谱聚类算法学习样本的类标指示函数（伪类标），为特征选择提供判别式信息。另一方面，提出了潜在结构分析算法挖掘特征之间的关系并假设该潜在结构是一个低维线性子空间。该方法将非负谱聚类和潜在子空间学习融合起来能够选择出具有强判别力的特征子集。 2. 基于鲁棒结构子空间学习的多媒体语义映射。针对底层特征与高层语义之间的“语义鸿沟”，提出了鲁棒结构子空间学习框架，同时考虑特征学习和语义映射，为数据学习一个抽象的表示，建立底层特征与高层语义之间的关联，从而缩减“语义鸿沟”。该子空间不仅保持了原始特征空间的局部拓扑结构，还保持了标签级别的局部和全局一致性。此外，该方法融合了行稀疏模型，对离群点和噪声具有鲁棒性。本文将该方法应用于社会图像标注、聚类、半监督和监督分类问题，均取得了显著的性能，说明本方法能够为多媒体数据学习一个有效的特征表示。 3. 基于统一潜在子空间学习的个性化社会标签推荐。为便捷用户管理组织个人图像，提出了一种通过挖掘用户的历史标注行为以及地理位置信息的个性化社会标签方法。本方法学习一个统一潜在子空间，挖掘每个用户的标注偏好以及每个地理位置对应的标注倾向性，建立个性化（地域化）的视觉底层特征与高层语义标签的关联。针对用户新上传的图像，利用用户信息和地理信息，进行基于语义和内容检索对其自动推荐标签。 4. 基于潜在因子分析的多媒体新闻检索。为了使在线新闻阅读用户...
英文摘要	With the development of technologies, the popularity of the digit mobile devices and social media, multimedia content explosively increases. Multimedia presents rich and vivid content in the form of image, audio and video besides text. Consequently, how to effectively analyze and understand these massive multimedia information with rich contents and forms becomes a hot research topic studying recently. There are increasingly more kinds of features of multimedia data and the dimensionality of features is becoming increasingly high. However, not all the features are helpful to the performance. Most of them are often correlated or redundant to each other, and sometimes noisy. It can promote multimedia analysis and understanding to uncover the data representations reflecting the intrinsic properties from these features. On the other hand, multimedia data are widely described by low-level features and there exists the well-known "semantic gap" between the high-level semantic meaning and the low-level visual features. Fully exploiting the rich context information can learn a compact representation for multimedia data, bridge low-level features and high-level semantics and effectively improve multimedia content analysis and understanding. In addition, the information of multiple media usually have compatibility and complementary characteristics to represent the semantic information. Jointly mining and fusing these heterogeneous information enables to make multimedia analysis and understanding better. For the above issues, based on subspace learning, this thesis makes a study of theoretical researches (feature selection and semantic mapping) and applications (personalized tag recommendation and news retrieval). Our main contributions are summarized as follows. 1. Nonnegative spectral clustering and structural learning-based unsupervised feature selection. To handle the high-dimensional, noisy and redundant features, we propose an unsupervised feature selection framework to jointly exploit the nonnegative spectral clustering and the latent structured analysis. In the feature selection procedure, we propose a novel nonnegative spectral clustering algorithm to learn the label indicate function, which can provide discriminative information for feature selection. On the other hand, we propose to uncover a latent shared structure to mine the feature correlation and assume that the latent structure is a low dimensional linear subspace. The proposed method can effect...
关键词	多媒体分析与理解图像标注标签推荐新闻检索子空间学习特征学习因子分析 Multimedia Analysis And understAnding Image Tagging Tag Recommendation News Retrieval Subspace Learning Feature Learning Factor Analysis
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6542
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	李泽超. 基于子空间学习的多媒体内容分析与理解[D]. 中国科学院自动化研究所. 中国科学院大学,2013.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20101801462804（5719KB）			暂不开放	CC BY-NC-SA