多媒体内容分析与个性化检索

CASIA OpenIR > 毕业生 > 博士学位论文

	多媒体内容分析与个性化检索
其他题名	Multimedia Content Analysis and Personalized Retrieval
	张晓宇
	2010-05-31
学位类型	工学博士
中文摘要	随着科学技术的进步，特别是多媒体技术和计算机网络技术的飞速发展，包括图像、视频在内的多媒体信息凭借其生动、形象、直观的多样化表现形式，逐渐成为人们生产、生活中不可或缺的重要内容，每天都会有大量的多媒体数据不断地涌现出来。面对纷繁芜杂的多媒体信息，如何建立一整套行之有效的内容分析和检索技术是一个亟待解决的问题。本文主要针对一般图像和博客视频这两类多媒体对象，深入探讨和研究多媒体内容分析与个性化检索的相关理论和方法，通过充分挖掘数据之间固有的、内在的联系，有效利用包罗万象的海量网络资源，借助现有相对成熟的搜索技术，并结合机器学习的方法，建立底层特征和高层语义之间的合理映射，从而帮助计算机更加准确而高效地理解图像/视频的内容、理解用户的个性化查询偏好和意图。本文的主要工作和贡献如下：（1）深入探讨了主动学习对于图像检索中相关反馈的积极作用，通过总结与分析现有工作的优点和不足，提出了一种动态批量选择性采样模式，利用“逐一标注，批量训练”的方式，综合考虑了当前分类模型和先前标注样本这两方面的信息对后续样本进行有针对性的选取，有效地兼顾了性能和效率。基于动态批量选择性采样模式，我们提出了三种具体的样本选取策略：分类面平移策略、确定度传播策略和动态可行域切割策略，这三种策略分别从不同的角度出发，深入挖掘样本之间的相互关系，并以此为依据对样本的选取提供有效指导。（2）针对多标签图像分类的特点，基于现有的解决方案，提出了一种多视角二维主动学习算法，将多视角学习与主动学习有机结合起来，通过深入挖掘在样本、标签和视角这三个维度上的相关性和冗余性，有针对性地选取最有信息的“样本－标签”对进行标注。在每个视角内，我们利用二维主动学习的方法单独计算每个“样本－标签”对的不确定度；在各个视角间，我们通过多视角融合的方法计算每个“样本－标签”对跨视角的不确定度；最终的总不确定度是视角内不确定度与视角间不确定度两者的融合。通过在主动学习中引入多视角学习，我们不仅有效降低了信息冗余度，同时也大幅减少了数据标注量。（3）针对博客视频的特点，提出了一套博客视频管理框架，包括内容分析和个性化检索两大模块。在内容分析中，我们从语义和情感两个方面对博客视频的内容进行全面分析：语义标注方面，我们基于视频博客自身的文字内容提取语义信息，同时借助外部资源对现有标注内容进行补充与改进，以获得高质量的语义标注；情感分析方面，我们对不同浏览者的评论信息进行归纳总结，以方便用户从整体上了解其他浏览者的综合意见。通过内容分析所生成的语义标注和情感评价可以为后续的检索和浏览提供有效的帮助。在个性化检索中，我们通过深入分析基于文本和基于内容两大检索方式各自的优势与不足，提出一种新颖的“检索－推荐”模式，将博客视频的检索过程划分为了显式检索和隐式推荐两个阶段，在实现基于文本和基于内容两大检索方式分工协作的同时，也保证了用户操作和使用上的简易性；为了使得相似性匹配结果更加符合用户的感知，我们从认知心理学的研究结果出发提出...
英文摘要	With the rapid development of computer and internet technology, multimedia information, such as image and video, is playing an important and indispensible role in people's everyday lives. The huge amount of multimedia data made it very hard for users to efficiently access what they need. An effective way for multimedia content analysis and retrieval is the key to solve the information overload problem. In this dissertation, we study on the content analysis and personalized retrieval of image and blog video. Based on the intrinsic interrelation between multimedia data, we further incorporate the useful knowledge embedded in large-scale web resources to bridge the gap between low-level features and high-level semantic concepts. The goal is to better understand the content of multimedia data and more precisely capture the preference of users. The main contributions of this dissertation are as follows: (1) We indicate that active learning is of great help for relevance feedback in content-based image retrieval. Inspired by some related work, we propose a novel dynamic batch mode for selective sampling in active learning. Through one-by-one labeling and batch training, the selection of unlabeled examples is no longer dominated by the existing classification boundary, but also dependent on the previously labeled examples. Based on dynamic batch mode, we further present three strategies for sample selection, which are boundary moving strategy, certainty propagation strategy and dynamic version space reduction strategy. These strategies can effectively guide the selection of informative samples. (2) Multi-label image classification is a very challenging task with respect to the large demand for human annotation of multi-label samples. We propose a multi-view two dimensional active learning algorithm, which integrates the mechanism of active learning and multi-view learning. On one hand we explore the sample and label uncertainties within each view; on the other hand we capture the uncertainty over different views based on multi-view fusion. The overall uncertainty along the sample, label and view dimensions are obtained to detect the most informative sample-label pairs. The combination of multi-view learning and active learning proves effective for redundancy reduction. (3) We propose an effective way for video blog (vlog) content analysis, including semantic annotation and sentiment analysis. In order to acquire high-quality annotation for a vlog, we first e...
关键词	图像/视频内容分析语义标注图像/视频检索相关反馈 Image/video Content Analysis Semantic Annotation Image/video Retrieval Relevance Feedback
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6276
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	张晓宇. 多媒体内容分析与个性化检索[D]. 中国科学院自动化研究所. 中国科学院研究生院,2010.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20071801462807（5479KB）			限制开放	CC BY-NC-SA