CASIA OpenIR  > 毕业生  > 博士学位论文
基于深度稀疏表示的图像识别方法
吕乐
学位类型工学博士
导师赵冬斌
2017-05
学位授予单位中国科学院研究生院
学位授予地点北京
关键词深度学习 稀疏编码 图像聚类 中层视元提取 图像精细分类
其他摘要

针对大规模、高维度的图像分类任务,基于人工设计的特征提取算法很难快速准确的对图像进行识别。基于数据的特征学习方法,特别是深度学习,能够充分发挥并行计算架构的优势,从大量数据中提取分布式表示或稀疏表示实现特征重用,获得更高的识别准确率。但实际应用中,大量无标签图像数据更易获取,对这些数据进行标定仍需要消耗大量人力,因此,无监督特征学习方法成为了当前研究的热点。

  论文使用半监督及无监督特征提取方法训练神经网络提取稀疏特征表示,从而进一步提高算法在识别任务中的性能。首先,针对无监督算法提取特征判别能力较差的问题,提出了任务驱动的预测稀疏分解(PSD: Predictive Sparse Decomposition)算法。而后,我们基于无监督特征学习改善了聚类算法。针对特征表示维度过高不易聚类的问题,提出了深度稀疏表示的对偶性,使用二部图分割方法来发现特征表示中的抽象概念。最后将深度神经网络应用于车型识别上,分析研究不同分类依据及网络结构对监督训练神经网络识别性能的影响。进而提出将基于深度稀疏表示的聚类算法应用于中层视元挖掘,实现了准确、鲁棒的车型精细分类算法。论文将包含以下工作和贡献:

  受与类别信息无关因素的影响,无监督方法不能提取有效的特征表示来提高后续分类任务的准确性。针对这一问题,提出了基于任务驱动字典学习的预测稀疏分解算法,用半监督特征提取算法对神经网络进行预训练。理论证明了此模型可通过梯度下降法进行优化。在训练过程中,稀疏正则选出与输入信号最相关的基向量进行训练,有效防止过拟合。实验证明该算法所产生的特征表示不仅能准确重建输入信号,而且具有很强的识别能力。有效的特征表示使精调后的神经网络达到更高的识别准确度,在MNIST数据集上的识别误差从2.04%下降至1.98%

  由于高维特征表示难以聚类,大量聚类算法仍依赖于人工设计的特征提取算法。针对这一问题,提出了胜者通吃(winner-take-all)自动编码器特征表示与原始输入之间的对偶性。在此基础上,使用无向双边图分割算法对图像进行初始聚类,并采用支持向量机(SVM: Support Vector Machine)对聚类进行优化合并。在MNIST数据集上的实验表明,算法能够有效聚类不同书写风格的数字字符,并进一步将具有相同语义概念的数据合并为一类。该算法的聚类准确率为95%,已达到与监督的K近邻算法相同的水平。

  在车辆图像检索系统中,车辆分类信息能有效减小搜索范围。针对这一应用,我们收集了两组车型数据以研究不同分类依据对算法的影响,第一组数据包含5类基本车型,第二组数据根据厂商信息包含58类车型。在此基础上,我们使用AlexNetVGG等不同网络结构构建车型分类识别系统。实验结果表明,在数据量小的情况下,使用ImageNet数据集对网络预训练,之后在车型数据集上进行精调训练能获得更高的识别准确率。目前,只在车型数据上训练的网络在第一组数据上的识别准确率为87.3%。而预训练后再做精调训练的网络识别准确率在第一组数据上为94.8%,第二组为91.8%

  车型图像分类问题是典型的精细分类问题。直接处理整幅图像的算法很难获得更加准确的识别。因此,我们提出将基于深度稀疏表示的聚类算法应用于中层视元挖掘任务,挖掘出的图像块使局部差异更显著。最后,我们提出使用集成模型根据整幅图像及局部图像块来进行识别。算法能综合考虑全局及局部特征,消除图像中无关区域的影响,在精细分类任务中获得更高的准确率。并且当图像中出现遮挡时,我们的方法和其他方法相比鲁棒性更强。我们在CompCars数据集上进行实验,在无遮挡情况下,主流卷积网络算法最高识别准确率为98.4%,我们的方法为98.8%。在有遮挡时,卷积网络识别准确率为94%,我们的方法为96%

; Traditional machine learning algorithms usually employ various handcrafted features followed with preprocessing and data transformations. For large scale, high dimensional image classification task, these algorithms cannot achieve ideal performance. Hence, feature learning, especially deep learning, has received extensive attention. It can take full advantage of the parallel computing architecture and extract distributed representations and sparse representations from huge amount of data. Expressive representations will lead to higher recognition accuracy. In the practical applications, due to the high labor intensity of annotation work, it is prone to access large amount of unlabeled images. In this case, unsupervised feature learning becomes even more indispensable.

In order to improve the recognition performance, we employ semi-supervised and unsupervised feature learning methods to train neural network extracting sparse representations. At first, we propose a new semi-supervised predictive sparse decomposition based on task-driven dictionary learning to extract discriminative representations. Then, we adopt unsupervised feature learning to make better clustering algorithms. As we all know, it is intractable to cluster high dimensional representations. Hence, we propose the duality of deep sparse representations and formulate the clustering problem in terms of bipartite graph partitioning. Finally, we apply deep convolutional neural networks to vehicle recognition system. We analyze the performance influence of different network architectures and classification principles. Furthermore, we extend our clustering algorithm to mid-level visual elements detection and implement a more robust vehicle fine-grained classification system. The main contributions are as follows.

The representations extracted by pure unsupervised methods may not be discriminative enough for the ultimate discriminative task. To solve this problem, we propose task-driven predictive sparse decomposition to train the neural network in a semi-supervised way and prove this new model can be optimized by stochastic gradient descent algorithm. During the training phase, atom vectors which are the most relevant to input signal are selected by sparse regularization. Like dropout, it can avoid the overfitting problem. The experiments on MNIST dataset show that our method can extract more discriminative features. By exploiting expressive representations, we achieved 2.04% to 1.98% decrease in error rate on the fine-tuned neural network.

Due to the intractability of clustering high dimensional representations, most of the clustering algorithms still depend on handcrafted features. Hence, we explore the duality between images and features extracted using the winner-take-all autoencoder. Based on this property, we formulate the image clustering problem as a bipartite graph partitioning problem and use support vector machine (SVM) to refine the final clustering. The experiments on MNIST dataset show that our algorithm can discover digit characters with similar writing style and cluster images with the same semantic concept. A 95% of accuracy is achieved. This performance is comparable to the K nearest neighbor method which is a supervised method.

In the vehicle image retrieval system, categorical information is used to reduce the search scope. Hence, we collect two vehicle image datasets. These two datasets are grouped by car model and manufacturer respectively. The first dataset includes 5 car models and the second dataset includes 58 car manufacturer. Furthermore, we evaluate the performance of different network architecture on these datasets. The results show that the performance can be improved significantly by using pretraining on the ImageNet dataset. If we only train CNN on the first dataset, the best recognition accuracy we obtained is 87.3%. However, the pretrained CNN can achieve 94.8% on the first dataset and 91.8% on the second one.

Vehicle image classification is a typical fine-grained classification problem. It is hard to achieve higher recognition performance by directly dealing with the whole image. Hence, we propose an extending version of the clustering algorithm in Chapter 4 to mid-level visual elements based approach. These visual elements make the local discriminative features more salient. Based on the whole image and image patches, we present ensemble CNN model to predict vehicle category. This algorithm can remove the effect caused by irrelevant image region and achieve higher recognition accuracy. When the occlusion applies in the image, our algorithm is more robust. We evaluate our algorithm on the CompCars dataset. Without occlusion, the recognition accuracy of CNN and our method are 98.4% and 98.8%. With occlusion included, CNN decreases to 94% and our method is 96%.
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/14747
专题毕业生_博士学位论文
推荐引用方式
GB/T 7714
吕乐. 基于深度稀疏表示的图像识别方法[D]. 北京. 中国科学院研究生院,2017.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
毕业论文.pdf(12176KB)学位论文 暂不开放CC BY-NC-SA请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[吕乐]的文章
百度学术
百度学术中相似的文章
[吕乐]的文章
必应学术
必应学术中相似的文章
[吕乐]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。