基于内容的多媒体信息检索关键技术研究

CASIA OpenIR > 毕业生 > 博士学位论文

	基于内容的多媒体信息检索关键技术研究
其他题名	Research on key technologies of content based multimedia information retrieval
	刘扬
	2011-05-23
学位类型	工学博士
中文摘要	基于内容的多媒体信息检索是当今多媒体分析和处理领域的一个重要研究方向。它通过计算用户提交的查询实例与数据库中实例的内容相似度，对数据库中的实例进行自动排序，从而使用户能快速查找出感兴趣的信息。传统的多媒体信息检索技术主要以文本关键词为基础，利用图像和音视频数据的文本标注信息来进行查询检索。这种以文本为主的检索模式有明显的不足：首先，由于"语义鸿沟"的存在，无法在文本关键词的高层语义和媒体信息处理中提取的底层特征之间建立有效关联，使得这种检索模式对多媒体内容的分析与理解存在歧义性。其次，在大规模多媒体数据库中，多媒体信息的文本标注信息的获取需要耗费大量的人力和时间。因此，基于多媒体内容的信息检索技术得到了广泛的关注和研究。本文对基于内容的多媒体信息检索技术进行了深入分析，研究并探索了当前存在的若干难点问题：详细研究了多媒体信息的特征描述，对图像特征构建了鲁棒的特征子空间模型，同时设计了基于音视频特征的词袋模型检索框架并对音视频两种信息尝试了融合；创建了一致性词表结构实现多音频特征的有效融合和快速索引；提出新颖的视觉语义概念检测模块并实现了文本信息与视觉信息的语义关联；探索了基于多源信息的排序模型的目标函数学习等等。本文的主要工作和贡献如下： 1. 针对目前图像特征描述和数据建模存在的问题，结合传统的稀疏编码理论，提出了一种基于关键编码学习（Key-coding Learning）的地形学子空间模型（Topographic Subspace Model），可以有效地对图像进行具有判别力的稀疏描述。关键编码学习根据样本的标记分布被归为归纳迁移式学习方法，使用海量未标记辅助样本来解决机器学习中有标记训练样本不足的问题，辅助样本与训练样本不需要满足独立同分布条件；通过对未标记辅助样本构建地形学子空间模型，有效地对样本数据的分布进行准确建模；在地形学子空间中，对每幅图像提取的大量局部特征描述子进行关键编码学习，最终对每幅图像生成一组任意维的稀疏特征向量，这组稀疏特征向量兼具了计算快速和判别力强的优点。 2. 为了加强音视频特征描述和提高检索效率，本文提出了一个基于词袋模型的音视频检索框架。区别于传统的词袋模型，对视觉信息的检索构建了基于语义的视觉概念词袋模型（Bag-of-visual-concept-words Model）。该模型通过检测视频镜头中的视觉语义概念，对视频在时间序列上按照镜头构建视频语义关键词的词袋模型，有效地克服了传统词袋模型的"语义鸿沟"问题；在音频信息检索中本文提出了基于音频关键词的词袋模型（Bag-of-audio-words Model），并基于多个音频特征提出了一致性词表（Coherency Vocabulary）索引结构，有效地对多种音频特征进行融合并实现快速检索。在词袋模型的框架下，不同的后融合策略被用来实现视频和音频信息的融合 3. 通过基于内容的多媒体信息检索技术，本文设计并实现了一个基于网络视频的问答系统，通过网络视频来回答用户提出的文本问题。该系统集成了多个新颖模块，包括基于AdaBoost和Z-grid训练算法的视觉概念检测模块、多模态的复制视频检测（Video Copy Detect...
英文摘要	Content based multimedia information retrieval (CBMIR) is one of the most important research directions in the field of multimedia analysis and processing. It aims to automatically retrieve the desired information from multimedia database based on the query's content. The traditional multimedia information retrieval techniques mainly rely on the textual keywords, which realize retrieval by using textual tagging information. However, due to semantic gap, textual keywords cannot bridge the semantic gap between the low-level feature and high-level semantics, which results in the ambiguity for the understanding and analysis of multimedia information. In addition, the textual keywords are not always available for large-scale multimedia database, because human annotation is a time and human resource consuming work. Therefore, content based multimedia information retrieval attracts more and more research interests and attentions. In this thesis, we analyze the existing problems in CBMIR, and mainly address some challenging issues: building a robust subspace model for image features, designing a bag-of-words based retrieval framework and realizing seamless fusion between audio and visual information; generating the coherency vocabulary indexing structure for multiple audio features fusion and efficient indexing; proposing a novel visual concept detection module to associate textual information and visual information; learning the object function of reranking model based on multiple information. The main contributions of this thesis include following issues: 1. We propose key-coding learning based topographic subspace model based on sparse coding, which can generate discriminative and sparse representation for image. The key-coding learning is an inductive transfer learning according to the label distribution, which is applied to solve the insufficiency of training data in machine learning with the help from large-scale auxiliary unlabeled data. Here, unlabeled data do not satisfy the independent and identical distribution with labeled data. Topographic subspace model which is built from unlabeled auxiliary data can describe the data distribution correctly. In topographic subspace, we generate a sparse feature vector for an image to make a good compromise between the efficiency and the effectiveness by key-coding on great number of local features. 2. In order to enhance the audio and video feature description and improve the retrieval efficiency, we propose a ...
关键词	多媒体信息检索子空间模型复制视频检测问答系统 Multimedia Information Retrieval Subspace Model Video Copy Detection Question-answer System
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6331
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	刘扬. 基于内容的多媒体信息检索关键技术研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2011.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20071801462805（3319KB）			暂不开放	CC BY-NC-SA