基于局部上下文的图像内容理解研究

CASIA OpenIR > 毕业生 > 博士学位论文

	基于局部上下文的图像内容理解研究
其他题名	Research on Local Context Based Methods for Image Understanding
	吴子丰
	2014-11-30
学位类型	工学博士
中文摘要	本文研究的局部上下文是指图像中随空间位置变化而改变的某种属性。以视觉词包模型（bag-of-visual-words，BoVW）为例，它可以是局部特征的空间坐标。为了得到更复杂的局部上下文，可以建模空间邻近的局部特征之间的关系，还可以从局部特征的邻近图像区域中提取某种表达。在提取表达的时候，可以使用人工的或者学习得到的描述子；而学习表达的算法，又可以是无监督的或者有监督的。得到表达以后，还存在一个如何使用的问题。图像内容理解包含了很多形式各异的任务，如每图一标签的图像分类、每像素一标签的图像分割以及每对图像一标签的验证问题。此外，处理这些任务的不同算法也都影响着具体应该如何使用上下文知识。针对上述问题，以图像中局部上下文知识的表达和使用方法为研究对象，本文研究的主要内容如下： 1、提出了基于空间有向图的图像分类方法。该方法基于视觉词包模型，直接使用局部特征的空间坐标作为上下文的表达，并以多次局部聚集的方式使用它们。空间有向图取代了空间金字塔（spatial pyramid），它同时对空间分块以及分块之间的关系建模。 2、提出了基于上下文聚集的图像分类方法。该方法也基于视觉词包模型，并且使用视觉词包模型从局部特征的邻近图像区域提取上下文表达，以上下文聚集的方式使用它们。上下文表达取代了空间坐标，它能够帮助区分未经过空间对齐的具有歧义的局部特征。 3、提出了基于层级上下文的图像语义分割方法。该方法基于深度卷积神经网络，直接学习像素点的在多个尺度上的层级上下文表达并用来分类。 4、提出了基于上下文的跨视角步态识别方法。该方法基于深度卷积神经网络，直接学习像素点的上下文表达并送入与之同步学习得到的比较模型预测样本的相似度。
英文摘要	Local contexts here amount to some properties which vary as the spatial location changes. Take the bag-of-visual-words (BoVW) model for example, they can be the very spatial locations of local features. To obtain more complex contexts, one can model the relationships between neighboring local features, or extract some kind of representation from the neighboring region of each local feature. For the extraction, one can use either engineered or learned descriptors, and for the learning, one can apply either supervised or unsupervised algorithms. With the representations obtained, there is still a problem, i.e., how to use it? There are various tasks in image understanding, diverse in form, such as image classification (one label per image), image labeling (one label per pixel), and verification tasks (one label per pair). Besides, the way of leveraging contexts also depends on the diverse approaches to these tasks. Focusing on representing and leveraging the local contexts in images, this thesis covers four topics below. 1. Spatial directed graphs for image classification, which is based on BoVW. Context representations are the very spatial locations of local features, according to which multiple local pooling operations are applied. Directed graphs replace the spatial pyramid, which model not only the sliced blocks but also their relationships. 2. Contextual pooling for image classification, which is also based on BoVW. Context representations are extracted from the adjacent regions of local features with BoVW, according to which multiple local pooling operations are applied. True context representations replace spatial locations, which can help discriminating ambiguous features that are not finely aligned. 3. Hierarchical contexts for image labeling, which is based on deep convolutional neural networks (CNNs). It spontaneously learns hierarchical context representations which are used for classifying pixels. 4. Learned local contexts for cross-view gait based human identification, which is based on deep CNNs. It learns context representations which are fed into spontaneously learned comparator who predicts similarities.
关键词	图像内容理解上下文知识图像分类图像语义分割步态识别 Image Understanding Context Image Classification Image Labeling Gait Recognition
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6663
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	吴子丰. 基于局部上下文的图像内容理解研究[D]. 中国科学院自动化研究所. 中国科学院大学,2014.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20111801462806（3784KB）			暂不开放	CC BY-NC-SA