图像识别中的特征表达方法研究

CASIA OpenIR > 毕业生 > 博士学位论文

	图像识别中的特征表达方法研究
其他题名	Research on Feature Representation for Image Recognition
	刘颖璐
	2015-05-26
学位类型	工学博士
中文摘要	图像识别是计算机视觉领域的核心分支，它集成了数字图像处理、模式识别和机器学习等学科知识，是图像检索、图像标注、人机交互和智能视频监控等应用技术的基础。特征表达是图像识别研究的关键点，也是难点所在。本文以场景图像分类和物体识别为研究任务，从紧凑性、语义性和判别性出发，对图像识别中的特征表达进行了深入的研究，主要研究内容和贡献包括：提出了一种结合空间结构信息的特征紧凑表达方法。传统的词袋模型需要在特征汇聚阶段采用空间金字塔模型来补充空间信息，然而空间金字塔模型的特征维数随空间划分层数增加呈指数增长。针对这个问题，本文中提出在表观特征描述子上串联空间特征描述子的方式，只采用传统的词袋模型就可以提取图像多尺度的空间信息。具体地，定义了三种空间特征描述子，并且从多规模词典学习的角度解释了空间金字塔模型存在信息冗余的原因。在两个公共数据集上的实验结果表明，本文提出的方法无论在特征表达的紧凑性上还是判别性上皆优于空间金字塔模型。提出了基于Boosting的特征自适应汇聚(Pooling)方法。传统的汇聚方法大多是启发式的，不能充分利用图像的判别性空间信息。本文提出了一种自适应的空间汇聚方法，将汇聚参数化成矩阵的形式，并联合分类器统一建模。通过这样的方式，可以学到更灵活的汇聚函数（不只是均值汇聚和最大值汇聚）、提取更多样的空间结构（不只是规则的矩形区域）。且不同于现有方法中所有类别共享汇聚操作的做法，本文通过判别学习为每个类别学习特有的汇聚方式，充分利用了图像的空间判别信息。算法的有效性在三个场景图像数据集上得到了验证。提出了一种判别多类学习的CNN 特征汇聚方法。现有研究结果表明，将大规模数据库上预训练得到的CNN模型直接用于通用的小规模图像识别数据集的特征提取，性能显著优于传统的基于人工特征的方法。然而直接采用全连接层输出特征作为图像表达存在空间结构信息利用不充分等缺陷。针对该问题，本文利用空间结构信息丰富的卷积层特征图，结合重定向的最小均方回归模型学习判别的多分布加权汇聚。实验结果表明，本文提出的方法和CNN全连接层特征具有很好的互补性，将二者结合生成特征，在多个数据集上都取得了最高识别性能。提出了一种约束标签空间相似性的多源异构数据子空间学习方法。对于同一个分类任务，往往包含多种数据来源，借助标记样本丰富的数据源来辅助标记样本稀少的数据源，是多源异构数据分类的主要目标，但难点在于不同来源的数据特征维数和特征分布都可能存在差异。针对这个问题，本文提出了一种基于支撑向量机的子空间学习方法，不同于传统方法中约束特征空间相似性的做法，本文采用约束标签空间相似性的方式，充分利用了多源异构数据之间的相关性和判别性。该算法在文档数据以及图像数据上得到了有效的验证。
英文摘要	Image recognition is one of the core branches of computer vision. It integrates the techniques of digital image processing, pattern recognition and machine learning, and is the foundation of image retrieval, annotation, human-computer interaction, intelligent video surveillance, etc. Feature representation is a key and difficult issue for image recognition. This thesis focuses on the task of natural scene image classification and object recognition, and studies into feature representation for image recognition from the viewpoint of compactness, semantics and discrimination. Our efforts and contributions are as follows. A compact spatial feature representation method for image classification is proposed. Traditional Bag-of-Visual-Words(BoVW) usually resorts to Spatial Pyramid Matching(SPM) to exploit spatial information of image in the pooling step. However, the dimensionality of SPM increases exponentially with the resolution of image division. This paper proposes an alternative framework of SPM for describing spatial information. By directly concatenating the spatial descriptor into the appearance descriptor, the ordinary BoVW model can exploit multi-scale spatial information efficiently. We design three spatial descriptors, and interpret why SPM suffers from redundancy of information from the view of multiple codebooks learning. Experimental results on two public image datasets show that the proposed model is more compact and discriminative compared to SPM. An adaptive pooling method based on Boosting for image classification is proposed. Most of traditional pooling methods are heuristic. Our adaptive pooling method parameterizes pooling as a matrix and models it along with a discriminative classifier. In this way, more flexible pooling function can be learned (other than average pooling and max pooling) and more various spatial layout can be extracted (rather than regular rectangle regions). Different from the existing methods, our method allows category-specific pooling to further enhance the discriminative power. Experimental results on three scene datasets demonstrate the effectiveness of our method. A multi-class learning method for CNN-based feature pooling is proposed. Recent works show that the CNN model trained on sufficiently large and diverse datasets such as ImageNet can be successfully transferred to other visual recognition tasks with a limited amount of training dataset. However, to take the output of full-connected layer of CNN as th...
关键词	图像识别特征表达空间结构信息自适应汇聚多源异构数据分类 Image Recognition Feature Representation Spatial Structure Information Adaptive Pooling Multiple Outlooks Learning
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6691
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	刘颖璐. 图像识别中的特征表达方法研究[D]. 中国科学院自动化研究所. 中国科学院大学,2015.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20111801462805（13896KB）			暂不开放	CC BY-NC-SA