基于视觉显著性的目标分类与检测研究

CASIA OpenIR > 毕业生 > 博士学位论文

	基于视觉显著性的目标分类与检测研究
其他题名	Object classification and detection based on visual saliency
	黄永祯
	2011-05-28
学位类型	工学博士
中文摘要	目标分类与检测是计算机视觉与模式识别领域的基本问题和关键环节，直接影响到目标跟踪、行为识别、场景理解等多项研究。目标分类与检测是计算机视觉和模式识别应用如视频监控、生物特征识别、图像检索等的关键问题，也是多学科间联系的重要纽带，关乎医学影像、神经认知学、视觉心理学等领域的发展。当前，绝大多数目标分类与检测的研究是基于以Marr 视觉理论为代表的现有计算机视觉框架。这些研究工作缺乏视觉认知机理，特别地，在特征表达以及特征关系建模过程中，忽略了视觉显著性的重要性，和人眼视觉系统相比具有较大差异。这是当前目标分类与检测系统在较复杂场景中缺乏鲁棒性并相对较慢的重要原因之一。本文围绕目标分类与检测的特征表达以及特征关系建模这两个问题，开展了下述工作： 1) 受神经生理学的稀疏编码和反馈机理启发，本文利用局部视觉显著性改进了著名的层级模型，提高了层级模型在目标分类与检测中的精度，并且利用基于Boosting的反馈机理显著提升了运算速度。 2) 利用局部特征视觉显著性机理，本文提出了显著编码，并将其应用于视觉词典模型中。我们从数学几何、数值分析等方面证明了显著编码在精度上优于其他编码方式。在计算复杂度上，与当前效果最好的编码方式相比，显著编码降低了一个数量级。 3) 采用特征的全局关系优先原理，本文提出了一个基于特征图模型的视觉词典新框架。该工作在国际上第一次对视觉单词之间的特征空间关系进行建模，突破了传统视觉词典算法重局部而轻全局的思想。同时，我们证明了之前各种视觉词典模型是该框架下的特例。在保持计算复杂度基本不变的前提下，我们提出的方法大大提高了目标分类的精度，在多个主流目标分类数据库上取得了迄今为止的最好结果。 4) 基于目标全局场景和类别等显著性，本文提出了一种融合全局显著性的目标检测模型，较大地提升了目标检测的精度。该项工作在PASCAL VOC2010目标检测竞赛中获得第一，代表着当今目标检测的国际领先水平。 5) 受拓扑知觉组织理论启发，本文提出了一个全局特征优先的形状描述方法，克服了传统形状目标分类算法中对全局特征描述不足的缺陷，大大提升了传统形状目标识别的精度。 6) 在以上工作的基础上，本文尝试将局部显著性和全局显著性归纳到一个统一的框架下，从非欧氏空间的数学描述入手，提出了基于非欧氏空间的特征表达和特征关系建模方法，在本质上有别于传统的基于欧氏空间的视觉计算理论。在基于非欧氏空间的视觉计算验证中，我们得到了与人眼视觉系统惊人相似的结果，展示了非欧氏空间在研究目标特征表达的巨大潜力。
英文摘要	Object classification and detection is one of the fundamental problems in computer vision and pattern recognition. It is also a critical step, directly influencing many other computer vision tasks, such as object tracking, action recognition and scene understanding. Object classification and detection can be applied in many areas, e.g., visual surveillance, biometrics and image retrieval. It is also an interdisciplinary topic linking computer vision with other domains like medical imaging, neuroscience and visual psychology. Most current work on object classification and detection is based on Marr’s computational vision theory. These studies, however, ignore cognitive mechanisms. For example, they do not recognize the importance of visual saliency in feature representation. Therefore, there is a huge gap between the computer vision system and the human visual system in many aspects, e.g., robustness and efficiency. In this thesis, we attempt to address these issues. Our contributions include: 1) We improve the famous HMAX model and propose an enhanced biologically inspired model. Compared with the original model, ours enhances the speed for at least 20 times while at the same time improves the accuracy for object classification. 2) We study visual saliency of local features and develop salient coding for the codebook based model. We prove, from the viewpoint of geometrical and numeric analysis, that salient coding is better than other coding methods. Moreover, the computational complexity of salient coding is much lower than that of current best coding schemes. 3) We present a codebook graph based model. This model is, to our best knowledge, the first to model the relations of visual words in feature space. The new framework greatly enhances the accuracy of object classification. Meanwhile, we prove that traditional codebook models are special cases under our framework. The proposed model achieves the state-of-the-art performance in a number of popular object classification databases. 4) We integrate the information of scenes and object classes into object detection. This strategy is useful to enhance the accuracy of object detection. The system based on this work is ranked the best performing object detection algorithm in PASCAL VOC2010. 5) We propose a global-to-local shape description method inspired by the topological perceptual organization theory. It addresses the problem that traditional shape classification methods ignore global propert...
关键词	目标分类目标检测特征表达特征关系建模非欧氏空间 Object Classification Object Detection Feature Representation Features' Relation Non-euclidean Space
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6363
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	黄永祯. 基于视觉显著性的目标分类与检测研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2011.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20081801462804（14247KB）			暂不开放	CC BY-NC-SA