弱监督条件下的视觉场景解析方法研究 | |
樊峻菘 | |
2022-05 | |
页数 | 136 |
学位类型 | 博士 |
中文摘要 | 视觉场景解析任务在计算机视觉研究领域中具有重要的意义。它致力于对视觉图像做出像素级的精确判别,从而赋予计算机对现实世界基于视觉图像的精细化感知与理解能力,在自动驾驶系统、机器人视觉导航、遥感图像分析等领域都具有重要的应用价值。近年来,深度学习技术的快速发展带来了一大批处理视觉场景解析任务的算法模型。然而,为了在实际应用场景中取得可靠的性能表现,这些模型通常需要针对不同的应用场景,设计取得大量的、像素级的人工精细标注进行训练。这种对大量精细标注的需求造成了对专业人工劳动的高度依赖,带来了数据获取时间和经济上的高昂成本,阻碍了基于深度学习的视觉场景解析模型在新场景任务下的快速部署应用。为了缓解样本标注代价过大的问题,研究者们提出了利用弱监督标注进行视觉场景解析模型训练的学习范式。典型的弱监督标注包括图像类别标注、目标框标注、稀疏的点线标注等。相比于像素级的精细标注,这些粗略的弱监督标注更易获取,可以有效减少取得训练样本所需的人工标注代价。但同时,由于缺少精确的监督信息,在处理像素级的视觉场景解析问题时,弱监督方法也面临着目标部分缺失、类别混淆等诸多挑战。为此,本文的研究工作从以下四个方面递进展开,探讨如何在弱监督条件下更好地有效挖掘利用数据信息,提升弱监督模型处理视觉场景解析任务的能力,其主要创新点包括:
总的来说,针对弱监督的视觉场景解析问题,本文工作首先展开对单图像条件下弱监督信息利用机制的研究。之后,依次从数据和模型的角度,展开对潜在监督信息挖掘的方法研究,分别提出基于多图信息传递、多元目标集成的弱监督视觉场景解析方法。最后,本文研讨如何利用类别、空间位置的多种弱监督信息完成复杂的视觉场景解析任务,实现弱监督下的全景分割。本文所提出的方法对比同期工作,均具有显著的性能提升,在领域内通用的评测数据集上达到领先的性能指标,能够有效地缓解弱监督视觉场景解析中面临的分割目标缺失、类别混淆等问题,具有很好的学术创新意义和实际应用价值。 |
英文摘要 | Visual scene parsing tasks play an important role in the field of computer vision. It aims at producing pixel-level discrimination for visual images, which endows the computers with the ability to accurately perceive and understand the real world through visual images, and has important application value in the fields of autonomous driving systems, robotic visual navigation, and remote sensing analysis. In recent years, with the rapid development of deep learning technologies, many visual scene parsing models have emerged. However, to achieve reliable performance in practical applications, these models generally require large amounts of pixel-level annotations for training, which are obtained by specific human labors for various target application scenarios. The requirement of large amounts of precise annotations highly relies on professional annotators, which causes high burdens on the time and economic costs for data acquisition, hindering the rapid generalization and deployment of deep visual scene parsing models in new applications. To alleviate the problem of excessive annotation burden, researchers propose a new learning paradigm that employs weakly-supervised annotations to train visual scene parsing models. Typical weakly-supervised annotations include image-level class annotations, bounding-box annotations, sparse point or scribble annotations, etc. Compared with the accurate pixel-level annotations, these coarse weak labels are much easier to obtain. Thus, it can effectively reduce the annotation burden for obtaining training samples. However, due to the lack of accurate supervision, when dealing with the pixel-level visual scene parsing problems, weakly-supervised methods face many challenges, such as missing partial targets and category confusion. To this end, this paper conducts research progressively in the following four aspects, which explores how to more effectively mine and utilize the data information under the condition of weak supervision, and improve the ability of the weakly-supervised models in processing visual scene parsing tasks. The main contributions include:
- proposes a multi-image based cross-image information transition method for weakly-supervised visual scene parsing, which mines the potential relationship information between different samples to compensate for the information scarcity in weak supervision. This work is the first to propose the idea of using the relationship information between different images to assist the training of weakly-supervised semantic segmentation models. It models the correlation among pixels in different images, transmits and shares information across images in the training stage, achieves more consistent representations by collaboratively leveraging multiple images, and finally improves the learning of weakly-supervised models. - proposes a multi-target integration method for weakly-supervised visual scene parsing, which synthesizes multiple methods and models to mine potential supervision so that information can be discovered and utilized more sufficiently in weakly-supervised scenarios. This work analyzes the non-uniqueness of weakly-supervised pseudo-labels, finds that multiple pseudo-labels have some complementary information, and proposes an approach to jointly use these multiple targets to train weakly-supervised models. The approach leverages the deep models' robustness and a noise adaptation strategy to effectively extract complementary information from multiple targets. It demonstrates significant improvement over single target-based approaches. - proposes a visual scene parsing method for multi-type weak supervision, and studies how to utilize class and spatial information in the weakly-supervised labels to accomplish the visual parsing tasks in complex scenes. This work applies the point labels as the supervision carrier, cooperatively processes semantic and instance discrimination tasks, and achieves well-performed panoptic segmentation models with weak supervision. This work proposes a transition cost-based framework, which models the transition costs between adjacent pixels, and uniformly handles the semantic parsing and instance parsing tasks in the visual scene parsing problems. The proposed approach can effectively train panoptic segmentation models with weak supervision and achieve cutting-edge performance on large-scale datasets.
In summary, for the problem of weakly-supervised visual scene parsing, this paper firstly conducts research on the information utilization mechanism with single images. Then, it conducts research to mine potential information from the view of data and model, and proposes multi-image information transition and multi-target integration methods for weakly-supervised visual scene parsing, respectively. Finally, this paper studies how to utilize multi-type weak supervision incorporating class and spatial information to accomplish complex visual scene parsing tasks and realize the weakly-supervised panoptic segmentation approach. Compared with the concurrent works, the proposed approaches achieve significant improvements and achieve leading performance on the standard datasets in the field. It demonstrates the ability to effectively alleviate the problems of missing segmentation targets and category confusion in weakly supervised visual parsing problems, and has good academic innovation and practical application value. |
关键词 | 弱监督学习 视觉场景解析 语义分割 全景分割 |
语种 | 中文 |
文献类型 | 学位论文 |
条目标识符 | http://ir.ia.ac.cn/handle/173211/48926 |
专题 | 毕业生_博士学位论文 |
推荐引用方式 GB/T 7714 | 樊峻菘. 弱监督条件下的视觉场景解析方法研究[D]. 中国科学院自动化研究所. 中国科学院大学,2022. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | ||
弱监督条件下的视觉场景解析方法研究.pd(21274KB) | 学位论文 | 限制开放 | CC BY-NC-SA |
个性服务 |
推荐该条目 |
保存到收藏夹 |
查看访问统计 |
导出为Endnote文件 |
谷歌学术 |
谷歌学术中相似的文章 |
[樊峻菘]的文章 |
百度学术 |
百度学术中相似的文章 |
[樊峻菘]的文章 |
必应学术 |
必应学术中相似的文章 |
[樊峻菘]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论