Visual scene parsing tasks play an important role in the field of computer vision. It aims at producing pixel-level discrimination for visual images, which endows the computers with the ability to accurately perceive and understand the real world through visual images, and has important application value in the fields of autonomous driving systems, robotic visual navigation, and remote sensing analysis. In recent years, with the rapid development of deep learning technologies, many visual scene parsing models have emerged. However, to achieve reliable performance in practical applications, these models generally require large amounts of pixel-level annotations for training, which are obtained by specific human labors for various target application scenarios. The requirement of large amounts of precise annotations highly relies on professional annotators, which causes high burdens on the time and economic costs for data acquisition, hindering the rapid generalization and deployment of deep visual scene parsing models in new applications. To alleviate the problem of excessive annotation burden, researchers propose a new learning paradigm that employs weakly-supervised annotations to train visual scene parsing models. Typical weakly-supervised annotations include image-level class annotations, bounding-box annotations, sparse point or scribble annotations, etc. Compared with the accurate pixel-level annotations, these coarse weak labels are much easier to obtain. Thus, it can effectively reduce the annotation burden for obtaining training samples. However, due to the lack of accurate supervision, when dealing with the pixel-level visual scene parsing problems, weakly-supervised methods face many challenges, such as missing partial targets and category confusion. To this end, this paper conducts research progressively in the following four aspects, which explores how to more effectively mine and utilize the data information under the condition of weak supervision, and improve the ability of the weakly-supervised models in processing visual scene parsing tasks. The main contributions include:
- proposes a multi-image based cross-image information transition method for weakly-supervised visual scene parsing, which mines the potential relationship information between different samples to compensate for the information scarcity in weak supervision. This work is the first to propose the idea of using the relationship information between different images to assist the training of weakly-supervised semantic segmentation models. It models the correlation among pixels in different images, transmits and shares information across images in the training stage, achieves more consistent representations by collaboratively leveraging multiple images, and finally improves the learning of weakly-supervised models.
- proposes a multi-target integration method for weakly-supervised visual scene parsing, which synthesizes multiple methods and models to mine potential supervision so that information can be discovered and utilized more sufficiently in weakly-supervised scenarios. This work analyzes the non-uniqueness of weakly-supervised pseudo-labels, finds that multiple pseudo-labels have some complementary information, and proposes an approach to jointly use these multiple targets to train weakly-supervised models. The approach leverages the deep models' robustness and a noise adaptation strategy to effectively extract complementary information from multiple targets. It demonstrates significant improvement over single target-based approaches.
- proposes a visual scene parsing method for multi-type weak supervision, and studies how to utilize class and spatial information in the weakly-supervised labels to accomplish the visual parsing tasks in complex scenes. This work applies the point labels as the supervision carrier, cooperatively processes semantic and instance discrimination tasks, and achieves well-performed panoptic segmentation models with weak supervision. This work proposes a transition cost-based framework, which models the transition costs between adjacent pixels, and uniformly handles the semantic parsing and instance parsing tasks in the visual scene parsing problems. The proposed approach can effectively train panoptic segmentation models with weak supervision and achieve cutting-edge performance on large-scale datasets.
In summary, for the problem of weakly-supervised visual scene parsing, this paper firstly conducts research on the information utilization mechanism with single images. Then, it conducts research to mine potential information from the view of data and model, and proposes multi-image information transition and multi-target integration methods for weakly-supervised visual scene parsing, respectively. Finally, this paper studies how to utilize multi-type weak supervision incorporating class and spatial information to accomplish complex visual scene parsing tasks and realize the weakly-supervised panoptic segmentation approach. Compared with the concurrent works, the proposed approaches achieve significant improvements and achieve leading performance on the standard datasets in the field. It demonstrates the ability to effectively alleviate the problems of missing segmentation targets and category confusion in weakly supervised visual parsing problems, and has good academic innovation and practical application value.
|Keyword||弱监督学习 视觉场景解析 语义分割 全景分割|
|樊峻菘. 弱监督条件下的视觉场景解析方法研究[D]. 中国科学院自动化研究所. 中国科学院大学,2022.|
|Files in This Item:|
|Recommend this item|
|Export to Endnote|
|Similar articles in Google Scholar|
|Similar articles in Baidu academic|
|Similar articles in Bing Scholar|
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.