CASIA OpenIR
精细化图像像素级二分类问题研究
王裕沛
Subtype博士
Thesis Advisor黄凯奇
2019-06
Degree Grantor中国科学院自动化研究所;中国科学院大学
Place of Conferral中国科学院自动化研究所;中国科学院大学
Degree Discipline模式识别与智能系统
Keyword图像像素级二分类 边缘检测 阴影区域分割 显著性物体分割
Abstract

图像像素级理解是对图像内容细粒度的精细化理解,以让计算机更准确、更精细地感知环境信息,其涉及边缘检测、语义分割等多种稠密像素级理解任务,具有重要的研究意义和应用价值。本论文以图像边缘检测、阴影区域分割、显著性物体分割三种得到广泛研究的像素级二分类问题为具体任务,研究基于全卷积神经网络的精细化像素级二分类方法。由于深度卷积神经网络强大的特征表达能力,基于全卷积神经网络的图像像素级预测得到快速发展。全卷积神经网络通过堆叠连续的卷积层和下采样层,获得输入图像鲁棒的特征表达,然而,连续的下采样层导致特征分辨率的严重损失,这些丢失的空间细节信息对于精细化像素级预测是至关重要的。这一内在矛盾是基于深度卷积神经网络的精细化像素级二分类的核心问题。为此,本论文以上述三种常见的像素级二分类问题为具体任务,围绕着基于全卷积神经网络像素级二分类的内在矛盾,通过逐渐优化网络结构,研究如何获得鲁棒判别信息和丰富空间细节信息的有效融合,从而得到精细化的像素级二分类结果。所开展研究工作可以概述如下:

1. 针对边缘检测这一代表性的图像像素级二分类任务,基于深度卷积神经网络的边缘检测方法得到快速发展,但边缘检测结果仍然不够理想。可视化方面,边缘检测结果粗糙;数值分析方面,随着最大容许距离的减小,边缘检测性能迅速下降。因此,之前的代表性的基于深度卷积神经网络的边缘检测方法定位能力不足。为此,本论文提出一种基于反向修正融合和子像素卷积的精细化边缘检测方法,获得判别性特征和空间细节信息的有效融合,最终得到定位能力更强的精细化边缘检测结果。

2. 针对图像阴影区域分割这一广泛研究的像素级二分类问题,目前代表性的方法仍然很容易受背景干扰,并且难以准确获取空间上下文信息。为了准确分割阴影区域,高层判别性语义信息和底层空间细节信息都是至关重要的。为此,本论文在所提出的反向融合网络结构基础上,提出一种深度监督并行融合网络和稠密级联学习模式,进一步获得全局判别语义信息和局部空间细节信息的更全面融合,并隐式地引入空间上下文信息,从而获得精细化的阴影区域分割结果。

3. 针对显著性物体分割这一常见的像素级二分类任务,本论文发现之前的方法可以较准确地分割物体内部区域像素,然而错误主要集中于物体边缘区域。本文观察到物体边缘直接定义了物体形状,可以提供额外的辅助信息。为此,本论文在所提出的反向融合网络结构基础上,提出一种边缘引导的精细化显著性物体分割网络,包括物体分割和边缘检测两个子网络,同时学习物体掩码分割和边缘检测。通过对两个子网络的反向融合分支的多尺度分割、边缘特征进行融合、共享,从而利用边缘信息引导精细化的显著性物体分割。

Other Abstract

Pixel-level image understanding is to achieve precise dense pixel-level understanding for an image, which is able to make computer perceive surrounding environment with better accuracy and details. Image pixel-level labeling covers various dense pixel-level prediction tasks, including edge detection and semantic segmentation, which is important for both research and practical applications. This paper studies the fully convolutional network~(FCN) based dense pixel-level binary prediction problem, and focuses on three widely studied tasks, which consist of image edge detection, shadow detection and salient object segmentation. Taking the advantage of deep convolutional network in feature representation, the FCN based pixel-level prediction grows rapidly. By stacking successive convolution and pooling layers, the FCN achieves robust features. However, the repeated pooling operations result in the rapid loss in spatial resolution. The lost spatial details are critical for precise pixel-level understanding. This contradiction is the intrinsic problem within FCN based pixel-level binary classification. To this end, this paper focuses on the above three common pixel-level binary prediction problems, with the aim to address this intrinsic contradiction. By optimizing the network architecture progressively, this paper investigates how to effectively achieve more comprehensive fusion of robust discriminative features and rich spatial details, and finally achieves precise pixel-level binary prediction results. Our main works can be summarized as follows:

1. For the task of edge detection, deep convolutional neural network (ConvNet) based edge detection is growing rapidly. However, the detection results are not satisfactory. The edge detection results are blurry. By decreasing the maximal permissible distance when matching ground-truth edges during benchmark, the performance drops dramatically. Both qualitative and quantitative results show that edge maps from a ConvNet are not well localized. To this end, this paper utilizes a top-down backward refinement pathway to combine the multi-scale features encoded in the hierarchy of ConvNet progressively, and increases the resolution of feature maps to generate crisp edges simultaneously.


2. For the task of shadow detection, existing state-of-the-art methods are still vulnerable to background clutters, and often fail to capture the global context of an input image. These global contextual and semantic cues are essential for accurately localizing the shadow regions. Moreover, rich spatial details are also required to segment shadow regions with precise shape. To this end, this paper presents a novel model characterized by a deeply supervised parallel fusion~(DSPF) network and a densely cascaded learning scheme, on the basis of the proposed top-down refinement pathway. The DSPF network achieves a comprehensive fusion of global semantic cues and local spatial details by multiple stacked parallel top-down fusion branches, which are learned in a deeply supervised manner. Moreover, the densely cascaded learning scheme is employed to refine the details.


3. For the task of salient object segmentation, these state-of-the-art networks often produce blob-like salient object segmentation maps without accurate object boundaries. Our key observation lies in that additional knowledge about object boundaries can help to precisely identify the shape of the object. Based on the top-down refinement pathway, we propose a novel deep model---Focal Boundary Guided (Focal-BG) network, which consists of two interchanged sub-networks: mask and boundary sub-network. Our model is designed to jointly learn to segment salient object masks and detect salient object boundaries, features in the two sub-networks are fused and shared efficiently. Extensive experiments demonstrate that our joint modeling of salient object boundary and mask helps to better capture shape details, especially in the vicinity of object boundaries.

Pages114
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/23992
Collection中国科学院自动化研究所
Recommended Citation
GB/T 7714
王裕沛. 精细化图像像素级二分类问题研究[D]. 中国科学院自动化研究所;中国科学院大学. 中国科学院自动化研究所;中国科学院大学,2019.
Files in This Item:
File Name/Size DocType Version Access License
Thesis——0530.pdf(10289KB)学位论文 开放获取CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[王裕沛]'s Articles
Baidu academic
Similar articles in Baidu academic
[王裕沛]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[王裕沛]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.