物体识别中的视觉结构建模与推理研究

CASIA OpenIR > 毕业生 > 博士学位论文

	物体识别中的视觉结构建模与推理研究
	刘康伟
	2015-05-30
学位类型	工学博士
中文摘要	物体识别是计算机视觉中的经典问题，该领域涵盖了计算机视觉中的物体分类、物体检测、物体匹配等任务。物体识别技术的研究为很多高层视觉任务的解决奠定了基础，同时也在工业界有着重要的应用价值。近年来，物体识别研究取得了巨大的进展，然而物体识别任务仍然是一个非常具有挑战的问题，尤其是当图像中所包含的物体存在复杂的表观和姿态变化时。本文主要研究如何利用视觉结构模型对物体进行鲁棒地描述和表达，并以此解决物体识别任务中物体的弹性变形等难题。视觉结构模型的研究在物体识别任务中有着至关重要的作用，它的研究主要包括结构建模、结构学习和结构推理三个方面的问题。在本文中，我们对视觉结构模型中的三个问题展开深入研究，并在物体的变形结构建模，模型的结构化学习以及结构模型的快速推理上提出一系列有效的研究算法。本文围绕视觉结构模型开展了以下研究：为了解决计算机视觉中物体变形等难题，本文提出了一种基于物理变形分解的结构模型。我们首先基于力学原理对物体的变形机理进行分析，并提出一种新的变形分解模型来描述和处理复杂多样的物体变形。基于所提出的变形分解模型，我们将变形物体之间的匹配任务转化为一个随机场结构模型的推理问题，并通过对随机场模型的有效推理得到变形物体之间的相互对应关系。该方法能有效地对物体复杂的变形进行表达和描述，并在不同物体识别任务（如手写体识别和物体检测）中得到很好的应用。为了解决物体识别任务中的结构化学习问题，本文提出了一种基于数据驱动的深度结构学习算法。我们在神经网络模型中提出了一种新的结构网络层，并通过深度学习算法对结构模型进行端到端地训练，有效地学习到物体的结构化表达和结构参数，提高了模型对于形变物体的描述能力。在不同的物体识别任务（如物体分类和检测）上的实验结果表明深度结构学习算法大大提高了视觉结构模型的结构表达和物体识别能力。为了解决视觉结构模型的快速推理问题，本文针对一维标号随机场结构模型提出了一种改进的广义多标号移动的推理算法。在该算法中，我们将多标号移动算法的迭代优化过程转化为一个求解集合覆盖的问题，这大大减少了不必要的标号移动数目，加快了算法的推理速度。同时，我们在理论上扩展了多标号移动推理算法的应用范围，使其可以应用于结构模型中任意能量函数的优化问题。在图像去噪和立体匹配上的实验结果表明该算法在保证算法优化效果的前提下，大大提高了多标号移动算法的推理速度。为了解决视觉结构模型的快速推理问题，本文针对二维标号随机场结构模型提出了一种快速的基于标号坐标下降的推理算法。该算法通过在标号空间的水平、垂直和对角方向上分别执行标号坐标下降来对二维标号随机场模型进行快速地推理优化。与之前算法需要详尽地遍历标号空间中的所有标号不同，该算法有效地利用了标号集合的二维空间结构信息，并限制随机场中的节点只能在标号空间沿一个特定方向进行移动。因此，该算法无需遍历标号空间中的所有标号，并在每次迭代优化过程中得到更小的时间复杂度。在变形物体匹配和光流估计任务上的实验结果表明该算法在保证算法优化效果的前提下，明显地提高了算法的推理速度。
英文摘要	Object recognition is a fundamental problem in the area of computer vision, and it contains many important vision tasks, such as object matching, object classification and object detection. The techniques on object recognition lay the foundations for high-level vision tasks, while they have widely used in a large number of industrial applications. In recent years, great progress have been made in the area of object recognition, but it is still a challenging problem, especially when there are great deformations in the objects. This paper focuses on how to describe the deformation objects with visual structure models and solve the difficult and complex problems (e.g, the elastic deformations of object) in object recognition. The research on visual structure models is vitally important to object recognition and it consists of three important problems: how to model the structure properties of objects (structure modeling); how to learn the parameters in the structure model (structure learning) and how to infer the structure configuration (structure inference). In this paper, we focus on these three problems in visual structure model and propose a series algorithms to solve these problems. The main contributions of this paper are as follows: To describe and model deformable objects in vision tasks, we propose a novel structure model based physical deformation decomposition. Firstly, we analyze the deformation mechanism physically and propose a novel deformation decomposition model to describe various and complex deformations. Based on the physical deformation model, we formulate the matching problem as a two-dimensional label Markov Random Field, whose energy function is derived from the deformation decomposition model. Furthermore, we propose a two-stage method to effectively optimize the MRF energy function. To provide a quantitative benchmark, we build a deformation matching database with an evaluation criterion. Experimental results show that our method outperforms previous approaches especially on complex deformations. Finally, we apply the proposed method to two challenging vision tasks of handwriting recognition and deformable object detection. To train the parameters in structure models efficiently , we propose a novel structure learning algorithm based popular deep learning method. We develop a novel structure layer in deep neural networks to describe deformable object, and we train the structure parameter based on error back propagation. This deep structure not only learns the parameters in structure models, but also improves the ability of deep neural network to describe complex and various object deformations. The deep structure model has been successfully applied in object classification and object detection. To efficiently obtain the configuration of structure models, we propose generalized range move algorithms (GRMAs) for the inference of structure model. We extend the GRMAs to more general energy functions by restricting the chosen labels in each move so that the energy function is submodular on the chosen subset. Furthermore, we provide a feasible sufficient condition for choosing these subsets of labels. Meanwhile, we dynamically obtain the iterative moves by solving set cover problems. This greatly reduces the number of moves during the optimization. We also propose a fast graph construction method for the GRMAs. Experiments show that the GRMAs offer a great speedup over previous range move algorithms, while yielding competitive solutions. To propose an efficient inference algorithm for the structure model with two-dimensional labels, we develop a fast label coordinate descent algorithm (FastLCD). It optimizes the 2D label MRF models by performing label coordinate descents alternately in horizontal, vertical and diagonal directions. FastLCD utilizes the fact that the label set is two-dimensional, and restricts the pixels to change their labels along a direction in the label space. By this way, the FastLCD benefits from a lower time complexity. The experimental results show that the FastLCD offers a great speedup over traditional algorithms, while it obtains competitive solutions.
关键词	物体识别物体检测变形物体匹配结构建模
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/11826
专题	毕业生_博士学位论文
作者单位	中国科学院自动化研究所
第一作者单位	中国科学院自动化研究所
推荐引用方式 GB/T 7714	刘康伟. 物体识别中的视觉结构建模与推理研究[D]. 北京. 中国科学院研究生院,2015.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
物体识别中的视觉结构建模与推理研究.pd（11050KB）	学位论文		限制开放	CC BY-NC-SA