基于层次化表达学习的大规模图像识别研究

CASIA OpenIR > 毕业生 > 博士学位论文

	基于层次化表达学习的大规模图像识别研究
其他题名	Research on Hierarchical Feature Representation Learning in Large Scale Image Object Recognition
	任伟强
	2014-11-27
学位类型	工学博士
中文摘要	图像识别是计算机视觉中的基本问题之一，该领域的研究涵盖了计算机视觉中最基本的物体分类、物体检测、分割等领域，也是实现智能视觉分析等高层视觉语义分析所必须攻克的难题。近年来图像识别问题的研究已经取得了很大的进步，基于视觉词典模型和深度学习的理论和方法在一系列图像识别数据库上取得了当前最好的结果。但是我们还应看到，目前的图像识别算法的性能与人类视觉系统相比，还有相当大的差距。实际的图像中存在的尺度、光照、视角、形变等变化，以及严重的目标遮挡对人类视觉系统来说没有太大的困难，但却是当前的图像识别算法无法完全克服的问题。研究当前图像识别中存在的问题，结合视觉感知机理，改进和完善已有的特征表达理论，填补计算机视觉与人类视觉的之间的鸿沟，具有重要的理论意义和迫切的现实需求。随着计算机视觉在实际生活中的应用，大数据时代悄然到来，数据的规模出现了爆炸式的增长，也对当前的图像识别算法提出了新的挑战、新的需求。研究大规模数据下的高效特征提取算法和适于并行计算和在线学习的理论框架也成为图像识别领域的重要课题。围绕层次化表达学习这一基本问题，本论文开展了以下工作： 1.我们首先对基本特征学习单元进行了广泛的研究。为更好地理解基本特征学习单元，我们做了如下工作: 1）提出了一种基于视觉单词差异的局部超球面编码算法，该方法作为单层特征变换算法，通过在接近超平面的一个超球面上进行的特征重构，能够得到更具区分性的局部特征，从而提高图像分类和识别的性能。2）提出了一种基于最大相关熵准则的自动编码器，作为单层特征学习单元，该方法与普通的基于最小均方差(MSE)或者交叉熵作为重构项不同，我们基于最近提出的最大化相关熵准则(Maximum Correntropy Criteria, MCC)来对自动编码器进行训练，等价于在一个无穷维的核空间中对特征进行描述，非线性高维特征空间的隐式嵌入，赋予了MCAE模型比基于MSE的模型更强的表达和结构发现能力。 2.我们对层次化特征表达学习展开了深入的研究，并做出了如下工作：1）我们提出了一种基于深度神经网络的显式核映射算法，将特征映射到一个近似的核空间，使得我们可以通过使用映射后的特征实现基于非线性核学习的性能提升。2）提出了一种基于非线性卷积特征学习的K近邻分类算法，通过数据与任务驱动的方式直接对卷积神经网络进行层次化表达学习。通过使用邻分量分析作为网络训练目标函数，我们能够近似地对K近邻分类误差进行优化，使得学习到的特征在其特征空间具有更好的分布。 3.针对真实场景下的大规模图像识别应用，我们提出以下基于层次化表达模型的实用算法： 1）提出了一种基于线性多示例学习的弱监督物体检测算法，该方法以在ImageNet数据集上训练得到的大规模深层卷积网络为先验知识，对候选窗口进行特征编码，并提出一种线性支持向量机多示例学习算法，可以高效地在大规模分类标注数据上自动发现物体类别、学习物体检测模型。2）提出了一种基于神经网络的半监督物体检测算法，该算法针对部分标注数据，在只标注所有图像的类别标注和一少部分图像物体位置标注...
英文摘要	Image object recognition is one of the fundamental problems in computer vision. It covers several important tasks including image classification, object detection as well as image segmentation. Image recognition is also the first problem to be solved before handling much higher level visual semantic analysis. In recent years, image recognition based on Bag of Visual Words(BoVW) and deep learning has made great progress in many difficult image recognition datasets. However, there is still a big gap between computer vision systems and human visual system. The large variations of scale, illumination, viewpoint and deformation in real images as well as severe occlusion appears to be no difficulty for human. But these problems are great challenges for our image recognition algorithms. Investigating the difficulties in image recognition, improving the feature representation theory based on the visual perception theory and bridging the gap between human vision and computer vision are of great theoretical value and pressing practical demand. As more image recognition system deployed in real applications, the big data explosion has also make new challenge and new need for our computer vision algorithms. Efficient feature extracting methods and online learning frameworks suitable for parallel computing have also became an important research topic. In this thesis, from the basic idea of hierarchical feature learning, we attempt to improve the image recognition system with the following contributions: 1.We intensively study the basic feature learning modules for image object recognition. For better understanding of the basic feature learning unit, we have the following work: 1) We propose a Local Hypersphere Coding(LHC) algorithm that performs feature encoding based on the differences between visual words. As a single layer feature learning algorithm, LHC produces more discriminative feature representations and better image recognition performance. 2) we propose a maximum correntropy auto-encoder (MCAE) which learns more robust and discriminative representations than MSE based model by performing computation in an infinite dimensional kernel space. 2. We make in-depth study on hierarchical feature learning for image object recognition. Our contributions include: 1) We exploit the power of kernel by learning a kernel embedding neural network which explicitly maps data from Euclidean space to an approximated kernel space. 2) We propose a convolutional nonlinear f...
关键词	图像识别物体检测层次化表达特征学习大尺度数据 Image Recognition Object Detection Hierarchical Representation Feature Learning Large-scale Data
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6655
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	任伟强. 基于层次化表达学习的大规模图像识别研究[D]. 中国科学院自动化研究所. 中国科学院大学,2014.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20111801462805（9551KB）			暂不开放	CC BY-NC-SA