基于视觉结构表达与建模的物体检测研究

CASIA OpenIR > 毕业生 > 博士学位论文

	基于视觉结构表达与建模的物体检测研究
其他题名	Object Detection Based on Visual Structure Representation and Modeling
	张俊格
	2013-06-02
学位类型	工学博士
中文摘要	物体检测是计算机视觉领域最基础的研究问题之一，并直接影响着计算机视觉的很多其他问题如物体跟踪、行为识别、场景理解等。它在视频监控，生物特征识别，人机交互，多媒体检索，计算广告，无人车等很多领域都有着广泛的应用。虽然针对该问题的研究工作已经有许多，但物体检测至今仍然是一个非常难的课题，其中最难的是如何得到鲁棒的物体表达。本文从视觉结构表达与建模的角度对这一问题进行了研究，开展了以下工作： 1）通过对已有工作以及人们已有的关于结构的概念认识进行归纳总结，本文给出了视觉结构的准确定义，更进一步的，本文讨论了如何进行视觉结构表达与建模的一些思考和技术路线。 2）从信号处理、尺度空间理论以及过去成功的案例中总结受到启发，构建出局部特征化描述子，最后提出基于提升算法(Boosting)的局部结构化描述子的特征融合方案并纳入到拓扑星座结构模型。该方法在2010年的PASCAL VOC竞赛中取得了冠军成绩，代表着该领域的国际领先水平。 3）为在拓扑模型层面提供更加灵活的结构描述能力，本文提出了面向部件模型的空间混合结构模型。首先为削减模型复杂度，提出了面向部件模型的数据降解算法，然后在此之上提出了空间混合结构建模算法，增强了模型对视角变化、姿态变化的鲁棒性。在2011年的PASCAL VOC竞赛中该方法获得了检测任务冠军，再次强有力的证明了该方法的有效性和领先地位。 4）前面几项工作都是在结构拓扑已知的情况下进行的，而本文的最终目的是进行结构拓扑的自动学习，因此本文最后提出基于数据驱动的物体结构学习算法框架。实验结果表明提出的自动结构学习可以有效的克服遮挡、形变、背景等干扰。 5）通过量化分析之前提出方法的结果，本文发现这些系统的召回率较低，而事实上是存在巨大潜力能够提升召回率的。基于此，本文提出了一种有效的基于学习的端到端的语义窗口挖掘系统，提高了系统召回率并一定程度提高了系统的准确度。本文的算法是从结构化建模方法出发，最终导出适合语义窗口挖掘的模型和范式。从这个角度来讲，本章的算法也是基于结构化信息建模在后处理中的应用探索。 6）物体检测是计算机视觉研究的最基础问题，也是最有应用潜力的问题。我们在一个国际合作项目中，深入探索了物体检测（包括运动物体检测和静态物体检测）在面向家居安全的智能视频分析中的应用，实现了相关技术转移。在平台应用中，本文创新的提出了基于历史模式的运动物体分析以及基于场景结构学习的快速行人检测算法，在家居环境中该算法很大程度上提高了系统的鲁棒性和可靠性。
英文摘要	Object detection is a fundamental problem in computer vision, and its performance has direct influence on many other problems such as object tracking, 3D reconstruction, behavior analysis and scene understanding, etc. Object detection has also wide application in visual surveillance system, biometrics, human machine interface, content based multimedia retrieval, computing advertisement and driverless car. Although there is a vast literature on this topic, object detection remains a very challenging research problem. A general algorithm pipeline for object detection includes object representation, machine learning and optimization, windows sampling strategy and post-processing. How to build robust object representation is the most important among all these factors. In this thesis, we attempt to address this issue from the aspect of visual structure representation and modeling, and our contributions include: 1)Through reviewing the established work and summarizing people's concept about ``structure", we make the precise definition of visual structure. Moreover, we discuss how to study visual structure representation and modeling and gives the technical roadmap. 2)Inspired from the research in signal processing, scale space theory and the past successful cases, we propose Local Structured Descriptor (LSD). At the system level, we develop a boosted Local Structured Descriptor based topological star model. Based on the proposed method, we made an entry into PASCAL VOC2010 challenge and won the winner prize. 3) To make the topological model be capable of capturing more flexible structure, we propose spatial mixture modeling for part based model. We first reduce the space and time complexity of the model by data decomposition framework, then we discuss the proposed spatial mixture modeling. The proposed spatial mixture model is more robust to multi-view and multi-pose. In 2011, we again participated in PASCAL VOC2011 challenge and won the winner prize again, which indicates the leading role in object detection. 4) The previous work are all based on manually designed topological structure, while our goal is learning structure topology from data. Motivated by this, we propose a framework of data-driven automatic structure learning for object detection. The experimental results show that the developed method can well handle occlusion, deformation and cluttered background. 5) Through the quantitative analysis of the previous methods, we find the recall rate ...
关键词	视觉结构物体检测视觉表达机器学习 Visual Structure Object Detection Visual Representation Machine Learning
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6557
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	张俊格. 基于视觉结构表达与建模的物体检测研究[D]. 中国科学院自动化研究所. 中国科学院大学,2013.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20101801462807（10459KB）			暂不开放	CC BY-NC-SA