基于深度学习的图像分割和抠图算法的研究和应用
赵晓梅
2020-05-27
页数124
学位类型博士
中文摘要

近年来,基于深度学习的图像分割和抠图算法在计算机辅助诊断、视频编辑等众多领域中得到快速发展。然而,在计算机辅助诊断领域,存在训练数据不足、样本分布不均、病变区域边界模糊等难点问题。在视频编辑方面,运动模糊抠图是经典难题。现有运动模糊抠图方法存在依赖人机交互或短曝光图像辅助的缺点,并且缺少训练数据。针对这些问题,本文展开了深入研究并做出以下贡献:

1.融合基于图像块分类的全卷积分割网络和条件随机场网络进行脑肿瘤分割。在这个融合模型中,基于图像块分类的全卷积分割网络,用图像块分类形式进行训练,以缓解训练数据不足以及样本不均衡问题;用切片分割形式进行测试,以避免冗余计算量太大问题。条件随机场网络,以切片分割的形式进行训练和测试。它可以提高分割结果在外观和空间上的连续性,进而提高分割边界的准确性。融合网络的两个部分需要不同的训练条件,因此整个融合网络需要分三步进行训练:第1步,以图像块分类形式训练全卷积网络;第2步,固定全卷积网络参数,以切片分割形式训练条件随机场;第3步,以切片分割形式微调整个融合网络。BraTS 2013和2016数据中的实验表明,融合网络的脑肿瘤分割性能达到同期该领域最先进水平。

2.融合多角度二维全卷积分割网络进行三维脑肿瘤分割。由于三维卷积网络计算量太大,很多研究人员选用二维卷积网络以逐切片分割的方式进行三维脑肿瘤分割。然而二维卷积网络无法利用磁共振图像中的三维信息。为了解决这个问题,分别使用轴状、冠状、失状三个方向的图像块训练三个二维全卷积分割网络。三个网络分别从三个不同的角度对磁共振图像进行逐切片分割,并输出三个分割概率图。接下来,将三个分割概率图进行融合并将融合以后的分割概率图送入三维条件随机场进行优化。融合三个方向分割概率图和使用三维条件随机场均能有效利用起磁共振图像中的部分三维信息。BraTS 2015和2017数据中的实验表明,以上脑肿瘤分割方法达到同期该领域先进水平。

3.提出了一种基于alpha图估计的运动模糊手部自动抠图算法,并使用该算法的抠图结果修正人体分割结果。该抠图算法利用基于深度可分离卷积的编解码网络对表征前景不透明度的alpha图进行端到端估计,克服之前运动模糊抠图算法需要人机交互或短曝光图像辅助的缺点。另外,为了克服缺少训练数据的问题,提出了一种基于手部运动规律和运动模糊产生原理的虚拟运动模糊手部图像生成方法。该方法可以同时生成虚拟运动模糊手部图像及其对应的alpha真值图。在虚拟数据中的定量测试表明,以上抠图算法的抠图性能达到同期该领域先进水平。在真实视频中的定性测试表明,使用虚拟数据训练的抠图网络在真实视频中也具有很好的抠图性能。

4.提出了一种alpha图和前景图同时估计的运动模糊手部自动抠图方法。之前,大部分基于深度学习的抠图算法仅对alpha图进行估计。然而,仅仅已知alpha图的情况下,无法完全去除半透明区域的背景信息。在运动模糊手部图像中,半透明区域面积较大。残留的背景信息会严重降低背景更换后的真实感。本文提出的alpha图和前景图同时估计的运动模糊手部自动抠图网络可以有效解决这个问题。该网络采用多任务编解码结构。多个任务具有各自独立的解码网络,但是共用一个编码网络。共用的编码网络可以提取更加共性的特征,从而提高抠图性能。另外,为了提高抠图结果外观的合理性,在损失函数中添加了感知损失。实验证明,共用编码网络和使用感知损失均能明显提高抠图性能,并且同时估计alpha图和前景图的抠图网络可以输出更加真实自然的抠图结果。

5.提出了一种运动模糊手部和肖像的图像合成方法,并利用合成图像训练运动模糊手部和肖像的同时抠图网络。在使用视频聊天等视频应用时,画面中经常出现运动模糊的手部。要想进行运动模糊手部和肖像的同时抠图,需要大量该类训练数据。为了解决缺少该类训练数据的问题,本文将虚拟运动模糊手部图像和肖像按照一定规则合成到一幅图像中,并且同时生成alpha真值图和前景真值图。另外,为了使得合成图像更加真实,提出了一种肤色调整方法使得合成图像中手部和肖像面部肤色尽量一致。实验结果表明,该方法生成的合成图像可以很好地训练运动模糊手部和肖像的同时抠图网络,并且训练好的抠图网络在真实视频的发丝位置和运动模糊位置也具有很好的抠图性能。

英文摘要

In recent years, image segmentation and matting methods, which are based on deep learning, have achieved great improvement in many areas, such as computer aided diagnosis, video editing and so on. However, in computer aided diagnosis area, there are some difficult problems. For example, the training data is insufficient and imbalanced, and lesion boundaries are usually fuzzy. In video editing area, motion blurred object matting is a hard problem. Existing motion-blurred object matting methods have the disadvantage of needing user interactions or shot-exposure frames, and the problem of lacking training data. To alleviate or overcome these problems, we propose several methods. Our main contributions are summarized as follows:

1. A brain tumor segmentation model is developed by integrating a fully convolutional segmentation network, which is based on patch classification, and a conditional random field network. In this integration model, the fully convolutional segmentation network is trained in the form of patch classification, in order to alleviate the problems of training data insufficiency and imbalance. This segmentation network is tested in the form of slice segmentation, in order to avoid the redundant computations. The conditional random field network is trained and tested in the form of slice segmentation. It can help to improve the consistency of segmentation results, and help to obtain more accurate segmentation boundaries. The two components of the proposed integration model should be trained under different conditions, thus this integration model is trained by the following 3 steps: step 1, training the fully convolutional segmentation network by patches; step 2, training the conditional random field network by slices with parameters of fully convolutional segmentation network fixed; step 3, finetuning the whole integration model by slices. Experiments on BraTS 2013 and 2016 datasets show that, this integration segmentation model has state-of-art brain tumor segmentation performance.

2. A 3D brain tumor segmentation model is developed by integrating multiple 2D fully convolutional segmentation networks. Because 3D convolutional networks have heavy computation burden, many researchers employ 2D convolutional networks to segment 3D brain tumors slice by slice. However, 2D convolutional networks can't make use of the 3D information existing in magnetic resonance images. In order to solve this problem, three 2D fully convolutional segmentation networks are trained by patches extracted from axial, coronal, sagittal views respectively. The three segmentation networks segment magnetic resonance images slice by slice from three different views and output three segmentation probability maps. These three segmentation probability maps are fused, and then sent into 3D conditional random fields. Fusing these segmentation probability maps and employing 3D conditional random fields are helpful to make use of part of 3D information existing in magnetic resonance images. Experiments on BraTS 2015 and 2017 datasets show that, this segmentation model achieves comparable performance with other state-of-art brain tumor segmentation methods.

3. An automatic motion-blurred hand alpha matting method is proposed and a related synthetic dataset is generated. Matting results are then used to modify human segmentation results. This matting method employs an encoder-decoder network to estimate alpha images, which demonstrate the transparency of motion-blurred hands. The encoder-decoder network is based on depthwise separable convolutions. Compared with previous motion-blurred object matting methods, the proposed method doesn't need any user interactions or shot-exposure frames. Moreover, in order to solve the problem of lacking training data, we propose a convenient method to generate synthetic motion-blurred hand images and their groundtruth alpha images. This synthetic image generation method is based on the regular pattern of hand moving and the principle of motion blur. The quantitative evaluations on synthetic dataset show that the proposed matting method achieves state-of-art matting performance. The qualitative evaluations on real videos show that the matting model trained on synthetic dataset also has good performances on real videos.

4. An automatic motion-blurred hand matting method, which simultaneously estimates alpha images and foreground images, is proposed. Previously, most of matting methods based on deep learning only estimate alpha images. However, the background information existing in semi-transparent regions can't be completely removed, if only the alpha images are known. In motion-blurred hand images, the areas of semi-transparent regions are large. Then, the remaining background information will obviously reduce the sense of reality when changing background. The proposed automatic motion-blurred hand matting network, which simultaneously estimates alpha images and foreground images, can solve this problem. The proposed matting network employs a multi-task encoder-decoder structure. In this structure, each task has its own independent decoder, but multiple tasks share one encoder. The shared encoder can extract more common features, which are beneficial to improve matting performance. Besides, in order to make the matting results more reasonable, perceptual loss is added in the loss function. Experiments show that sharing encoder and employing perceptual loss can obviously improve matting performance, and the matting network, which simultaneously estimates alpha images and foreground images, can output more realistic and natural matting results.

5. A method is proposed to generate composite images that contain motion-blurred hands and portraits. Composite images are used to train matting networks which can deal with motion-blurred hands and portraits at the same time. In practical video applications, such as video chat, motion-blurred hands often appear. If we want to perform motion-blurred hand matting and portrait matting at the same time, a large number of training images are needed. In order to solve the problem of lacking such kind of training data, a method which generates composite images by combining synthetic motion-blurred hands and portraits is proposed. This method can simultaneously generate the groundtruth of alpha images and foreground images. In order to make the composite images look more real, a color adjustment method is proposed to make the skin color of motion-blurred hands as similar as the skin color of portraits. Experimental results show that, the composite images generated by the proposed method can train a good matting network which deals with motion-blurred hands and portraits at the same time. Moreover, the trained network also has good performances around hairs and motion-blurred hands in real videos.

关键词深度学习 脑肿瘤分割 全卷积网络 条件随机场网络 运动模糊手部 自动抠图 虚拟数据生成
语种中文
七大方向——子方向分类图像视频处理与分析
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/39130
专题多模态人工智能系统全国重点实验室_机器人视觉
推荐引用方式
GB/T 7714
赵晓梅. 基于深度学习的图像分割和抠图算法的研究和应用[D]. 中国科学院自动化研究所. 中国科学院大学,2020.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
Thesis-赵晓梅.pdf(13214KB)学位论文 开放获取CC BY-NC-SA
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[赵晓梅]的文章
百度学术
百度学术中相似的文章
[赵晓梅]的文章
必应学术
必应学术中相似的文章
[赵晓梅]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。