基于潜在变量的图像理解研究

CASIA OpenIR > 毕业生 > 博士学位论文

	基于潜在变量的图像理解研究
其他题名	Image Understanding with Latent Variables
	余轶南
	2012-06-06
学位类型	工学博士
中文摘要	图像理解中的图像匹配、图像分类和目标检测是计算机视觉和模式识别领域中的基本问题和重要环节。它们是摄像机标定、基于图像的目标三维重建、基于内容的图像检索、目标跟踪、行为识别等研究的基础。并且，它们的研究直接影响着智能视频监控、网络图像内容理解与管理、大规模视频数据挖掘等实际应用的发展。对它们的研究具有重要的科学理论意义和实际应用价值。当前的研究内容主要是建立在基于图像处理的图像特征提取和基于数理统计方面的机器学习理论。这些方面的研究在过去多年中取得了十分惊人的进展，甚至在一些方面取得了较为令人满意的研究成果和实际应用。然而，在图像理解领域，当前的研究成果相比于人脑的识别能力还存在较大的差距。这些差距体现在计算机视觉算法对图像或目标的视角变化、光照变化、空间位置变化、目标之间的遮挡等问题的判别力和鲁棒性上。这也是当前图像理解领域中较为困难的研究内容。基于潜在变量的模式识别方法是一种较新的建模方法。基于潜在变量的模型在考虑图像的可观测数据的同时，对图像的潜在数据进行建模，并在模型学习中全面的考虑图像的可观测数据和图像的潜在数据，从而建立更加鲁棒并具有较高判别力的模型。本文主要围绕着图像理解中的图像匹配，图像分类和目标检测三个方面的具体内容进行研究。本文的研究重点就是在基于潜在变量的模型总体框架下，针对这三个具体问题，如何建立具有针对性的潜在变量模型和设计有效的模型学习方法，开展以下的工作： 1) 基于潜在图像姿态和光照的图像匹配方法。图像匹配的核心内容是将两幅图像中的相同目标匹配在一起。通常的方法是从图像中抽取视角和光照不变的特征点。本文所研究的内容是当图像发生较大的视角或光照变化时，或对小视角或轻微光照变化鲁棒的特征检测子和特征描述子失效时，如何高精度匹配两幅图像。本文认为，不存在完全不变的特征检测子和描述子，从而舍弃设计新的检测子和描述子的思路，转而另辟蹊径，以图像潜在变量挖掘为基础，通过抽取待匹配图像中目标之间的潜在姿态和光照信息，将大视角和强光照变化的图像匹配问题转化为一个小视角、弱光照变化的图像匹配问题，从而降低图像匹配的难度，提高图像匹配的精度。 2) 基于潜在图像结构变量的图像分类方法。以图像的潜在结构信息为切入点，解决传统的基于视觉词典（Bag of Visual Words，BoVW）模型的图像分类框架中如何描述图像的空间结构关系的问题。图像的空间结构在图像分类中具有决定性的作用，而传统的视觉词典模型对图像的空间结构关系的考虑较为简单。其中应用比较广泛的是空间金字塔匹配（Spatial Pyramid Matching，SPM）方法。SPM简单的将图像划分为不同的区域，并在每个区域提取图像的BoVW特征，从而隐式的嵌入图像的空间结构关系。然而，由于图像结构具有一定的灵活性，而是用固定分块的方式无法捕捉这种变化。因此基于SPM的方法还具有一定的局限性。本文提出了一种基于图像潜在结构变量的空间金字塔匹配方法。该方法建立在传统的SPM方法之上，是对SPM空间位置信息表达的一种改进。该方法通过挖掘图像的潜在...
英文摘要	Image understanding, including image matching, image classification and object detection, is a basic issue and important link in computer vision and pattern recognition. It is a fundamental issue in camera calibration, image based 3D reconstruction, content based image retrieval, object tracking, action recognition and so on. It affects the research and development of intelligent visual surveillance, web image and video understanding and management, large scale visual data mining. Besides, it provides computational experiments for the cognitive sciences and help us to understand our brain better. Recently, the popular work on image understanding algorithm are usually based on image processing techniques with statistic based machine learning algorithms. These studies obtained amazing development in past ten years, and in some areas, they got successful applications. However, the results of current algorithms still have a gap between computer and human brain, in the challenge on discrimination and robustness of view, illumination, deformation, occlusion etc. These challenges are the most difficult problems in this area. Latent variable based pattern recognition is a new and promising model. The latent variable based model consider the observed data and the latent data simultaneously, modeling both the observed data and latent data, toward a more comprehensive, discriminative and robust model. This thesis mainly focuses on image recognition, especially the image matching, image classification and object recognition. We attempt to modeling the traditional problem under the latent variable model framework and design new model and learning algorithm, including: 1) We model the view and illumination change in image matching with latent variables. We consider that there is no full invariant local feature detector and descriptor. We mainly focus on the challenge of image matching when large view and illumination change. With the latent view and illumination variables, the image matching with large view and illumination will transmit to a low or no view and illumination change problem, which can be solved by traditional image matching algorithms. 2) We study the structure of image and import latent structure variable as a important issue in the traditional Bag of Visual Words model (BoVW). The traditional BoVW model ignores the spatial structure of image. In order to solve this problem, Spatial Pyramid Matching (SPM) is proposed. However, SPM model the s...
关键词	图像理解潜在变量模型图像匹配图像分类目标检测 Image Understanding Latent Variable Model Image Matching Image Classification Object Detection
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6478
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	余轶南. 基于潜在变量的图像理解研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2012.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20091801462806（10643KB）			暂不开放	CC BY-NC-SA