网络图像的弱监督语义分割

CASIA OpenIR > 毕业生 > 硕士学位论文

	网络图像的弱监督语义分割
	应鹏
	2016-05
学位类型	工学硕士
英文摘要	近些年来兴起了一大批社交网站和图像分享网站例如Flickr、 Facebook，这些网站鼓励用户上传图片并给图片打上标签用于描述图片内容，因此拥有了海量的网络图像资源，并且呈指数级增长。针对这些海量图像的管理和索引需要图像语义理解技术提供支持，而较粗粒度的图像语义理解诸如图像自动标注已经不能支持精细的图像检索、服装同款匹配等任务，对图像的细粒度深层次的语义理解比如图像语义分割成为切实需要，使得语义分割近些年来成为学术界和工业界的研究热点。图像语义分割是一个将图像分割和区域标注有机结合的视觉任务，其目的在于给出图像中每个像素的类别或者标签。传统的语义分割大都为全监督方法，训练时需要提供像素级别标注的训练样本。这种需要人工精确标注训练图像的做法使得全监督方法不适合大规模应用。虽然像素级别标注的图像大量获取非常困难，但是带有图像级别标签的图像可以从互联网上轻易获取。海量的带有弱标注的网络图像资源为图像语义分割的研究提供了一种更好的思路：弱监督方法。这里弱监督有两层含义：第一，网络图像往往只提供图像级别标签，并不提供像素级别标签；第二，网络图像的标签由于是人工标注上传的，不可避免会存在噪声标签。由于训练数据极易获取，模型可以面向大规模应用等一系列优势，弱监督方法成为了学术界研究的热点。目前对弱监督语义分割的研究已经取得了一定的成果，但是仍存在一些关键问题有待解决，比如如何更有效地利用弱监督标签，如何克服噪声标签对模型训练的干扰，如何提高图像区域的可学习性等。本文从这些问题入手，围绕弱监督图像语义分割展开深入探讨，提出了若干有效的解决方法。本文的主要成果和贡献包含以下几个方面： 1. 针对标注到区域问题，提出了一种基于超像素聚类的弱监督图像语义分割方法。通过对原始特征进行稀疏重构，提高特征的判别力。通过超像素聚类充分挖掘相同类别超像素的视觉一致性和不同类别之间的可分性，在簇这个级别上进行标签分配，有效地增加了类别决策的可靠性。在MSRC-21数据集上分割准确率达到了70%，在LMO数据集上达到了31%，均超过当时性能最好的弱监督方法。 2. 针对弱监督标签导入问题，提出了一种基于排外约束的弱监督学习算法用于图像语义分割。通过抑制超像素对图像级别标签集合之外标签的响应，对标签映射进行有效约束和引导，对噪声标签也有一定的抑制作用。相比于之前的超像素聚类模型，本方法在MSRC-21和LMO上分别带来了4个百分点和3个百分点的提升。 3. 针对图像区域的特征表示问题，提出了一种基于深度层级特征的弱监督语义分割算法，利用卷积神经网络学习超像素的特征表示，并且将超像素的上下文信息纳入到特征的构建之中，用超像素自身、邻近区域、亚场景和场景这四个层级的特征共同描述一个超像素，有效地提高了特征的判别力。利用排外约束项和判别项拟合弱标注数据，学习超像素到图像级别标签的映射。该方法既能够充分学习超像素的特征表示，又能克服卷积神经网络在弱监督条件下不易收敛的缺陷。该模型在公开数据集MSRC-21、 LMO、 VOC 2007和VOC 2012上相比之前方法在性能上都有不小的提升。 ; Recent years have witnessed the flourish of social media and the success of photo-sharing websites e.g. Flickr & Facebook. These websites allow users to upload images and assign tags to describe the image content. The exponential growth of web images poses a challenge to image index and retrieval, which should be supported by image understanding. However, the traditional image understanding such as image classification cannot meet the demands for image content completely. Fine-grained image content understanding e.g. image semantic segmentation is necessary， which has become an important research topic recently. Image semantic segmentation aims to collaboratively perform image segmentation and tag alignment with those segmented regions. Simply, it is a task of region-level annotation. The typical approach is to train the model on pixellevel annotated data. However, the obtainment of these pixel-level annotation costs much manpower and time. So the fully-supervised methods are unsuitable for large-scale data tasks. Unlike pixel-level annotated data, the obtainment of image-level annotated data is rather convenient. The online weakly-labeled data allow us to deal semantic segmentation with weak supervision methods. Here the weak supervision has two meanings. Firstly, the training data only has imagelevel labels. Secondly, the training data always has noisy tags. This thesis focuses on weakly-supervised image semantic segmentation and propose several solutions to the problem. The main contributions are summarized as follows. 1. We propose a novel superpixel clustering model for weakly-supervised image semantic segmentation. To obtain discriminative features, sparse coding is employed to represent each superpixel. The superpixel clustering, including a spectral clustering item and a discriminative clustering item, can obtain some clustering subsets of superpixels (ideally semantic regions), which has more meaningful semantics than independent superpixels. Extensive experiments on MSRC-21 and LabelMe Outdoor datasets prove the effectiveness of our approach. 2. How to import image-level labels as weak supervision to direct the regionlevel labeling task is the core task of weakly-supervised semantic segmentation. We propose an exclusive constrained discriminative learning model for image semantic segmentation, which can assist the label mapping for each superpixel by suppressing the responses of each superpixel on the labels outside its parent image-level label set. The proposed model can propagate the image-level labels to their regions effectively and suppress the noise tags to some extent. 3. We introduce the convolutional neural network(CNN) into weakly-supervised image semantic segmentation and propose a novel semantic segmentation algorithm based on deep hierarchical features. The proposed model embeds the context information of superpixels into feature construction. That is, features extracted from superpixel itself, adjacent area, sub-scene and scene are combined to describe a superpixel, which can alleviate the geometric and semantic ambiguities and make superpixels more separable semantically. Extensive experiments prove the effectiveness of the proposed model.
关键词	图像语义分割弱监督超像素聚类卷积神经网络
学科领域	模式识别与智能系统
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/11595
专题	毕业生_硕士学位论文
作者单位	中国科学院自动化研究所
第一作者单位	中国科学院自动化研究所
推荐引用方式 GB/T 7714	应鹏. 网络图像的弱监督语义分割[D]. 北京. 中国科学院大学,2016.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
应鹏硕士学位论文最终最终版.pdf（8262KB）	学位论文		限制开放	CC BY-NC-SA