CASIA OpenIR  > 毕业生  > 博士学位论文
面向小样本场景的语义分割方法研究
毛彬杰
2023-05
Pages91
Subtype博士
Abstract

基于深度学习的语义分割模型已经广泛应用于自动驾驶、机器视觉、遥感分割等领域。但是现有的语义分割模型往往需要大量像素级别的标注样本。此外,现有的分割模型也很难处理在训练阶段中未见过的类别。针对这些问题,研究者们提出了小样本场景下的语义分割任务,即小样本语义分割任务。该任务旨在仅提供极少量新类别带标注数据的情况下,使分割模型能实现面向新类别的分割任务。         由于给定的新类样本数量极少,因此很难通过重新训练模型或者对模型进行微调的方式为新类别单独训练一个新的模型。这也导致以往的语义分割模型无法直接迁移到小样本分割任务上。虽然已经有一些方法能够处理此任务,但是当前的小样本分割依旧面临着以下问题。(1)数据域的差异性问题。在小样本分割任务设置下,支持集样本和查询集样本之间的分布差异会导致模型对该类别的理解产生歧义。(2)新类样本的稀缺性问题。在极少量的新类样本条件下,网络模型很难通过常规方法学习到合适的参数。(3)视频场景下的小样本分割问题。 现有的小样本分割任务大都限定在图像领域,而很少探索通过建立有效的时空信息学习机制,实现在小样本场景下的视频分割任务。如何能够有效地解决上述问题,从而提高当前小样本分割模型的性能,是一个具有重要意义且极富挑战性的问题。

       为了解决上述问题,本文对小样本场景下的语义分割任务进行研究和探索,并为每个问题提供了一种相应的解决方案。本论文的主要研究内容与贡献归纳如下:

1.针对支持样本与查询样本之间的域差异问题,本文提出一种基于任务引导的小样本语义分割方法。该方法通过构建任务感知自适应模块提取任务信息,强化支持样本和查询样本之间的交互,从而缓解支持样本与查询样本的域差异性。具体而言,首先通过任务感知自适应模块从当前的输入中提取出特定的任务信息,然后通过注意力机制将任务信息同时作用于特征的通道和空间维度,实现特征的自适应增强。此外,本方法还通过逐步为预测结果增加细节信息的方式,对预测的结果进行进一步的优化。通过非参数的聚合操作,本方法所提网络能够处理任意样本条件下的分割任务,而不需要对模型进行结构上的调整。最后,在多个公开的基准数据集上的实验验证了所提方法的有效性。

2.针对小样本分割任务中的样本稀缺性问题,本文提出一种基于对偶原型的小样本语义分割方法。不同于大部分以往的小样本分割方法只关注于提取支持样本上的信息,本方法探索了从查询样本上获取信息的方法,从而提高模型对于稀缺新类样本的利用率。具体来说,所提模型不仅利用从支持样本中提取的原型,同时也利用通过循环比较模块从查询样本中提取的伪原型,指导对查询样本的分割。
本方法首先设计了循环比较模块,通过循环一致性的准则选择出可靠的前景特征,并生成相应的伪原型特征。随后进一步利用原型和伪原型之间的相关性对彼此进行特征增强。此外,所提方法还在两种原型和查询特征的密集匹配过程中引入多尺度的上下文信息,从而提升分割结果的准确性。最后,所提方法在相应的基准数据集上进行大量的实验以验证其有效性。

3.针对视频场景下的小样本分割问题,本文提出一种基于原型进化的小样本视频对象分割方法。通过构建原型进化模块传播查询视频中的时序信息,该方法既利用了待分割视频和支持图像之间的相关性,也利用了视频本身所蕴含的时序信息。具体来说,模型首先建立基于原型的框架建立支持图像和目标视频帧之间的关系。此框架具有更少的参数量与更快的推理速度。同时构建原型进化模块,融入时序信息到视频原型特征的进化过程中。该视频原型特征不会随着视频帧的增加而增加内存占用量。此外,所提方法还提出利用高层特征的方法,使模型能够以少量速度的代价换取更高的精度。最后,在多个公开的相关数据集上进行实验,表明所提方法在精度和速度方面均具有一定的优势。

Other Abstract

       Semantic segmentation models based on deep learning have been widely used in fields, such as autonomous driving, robot vision, and remote sensing segmentation and so on. However, the existing semantic segmentation models typically require a large number of pixel-level annotated samples, and struggle with the categories not seen during training. To overcome these limitations, researchers have proposed the semantic segmentation task in few-shot scenarios, also known as few-shot semantic segmentation task. This task aims to enable the model to perform segmentation on the novel categories with only a very small amount of annotated data.

       Since the number of novel classes samples provided is extremely limited, it is challenging to train a separate new model for the novel categories by retraining or fine-tuning the existing one. As a result, the previous semantic segmentation models are not directly applicable to the few-shot segmentation task. Although some new methods have already been proposed to tackle this task, the current few-shot semantic segmentation still faces the following problems. (1) The domain differences between the support samples and the query samples. In few-shot segmentation task, the distribution difference between the support samples and the query samples will lead to the ambiguity of the model's understanding of this category. (2) The scarcity of the novel class samples. Under the condition of a very small number of novel class samples, it is difficult for the network to learn appropriate parameters through conventional methods. (3) Few-shot segmentation in spatio-temporal scenarios. Most of the existing few-shot segmentation tasks focus on the images, and explore less on video segmentation tasks by establishing an effective spatio-temporal information learning mechanism. Effectively solving the above problems to improve the performance of the current few-shot segmentation models is significant and challenging.

In order to address the aforementioned problems, this dissertation studies and explores the few-shot semantic segmentation task and provides effective solutions for each problem. The main research contents and contributions of this dissertation are summarized as follows:

1.To address the domain gap issue between support samples and query samples, a task-guided method for few-shot semantic segmentation is proposed in this dissertation. 
The proposed method extracts task information by building a task-aware adaptive module to strengthen the interaction between support samples and query samples, thereby alleviates the domain difference.
Specifically, the specific task information is first extracted from the current input through the task-aware adaptive module, and then the task information is simultaneously applied to the channel and spatial dimensions of the feature through the attention mechanism to achieve adaptive enhancement of the feature. Additionally, the method further optimizes the predicted results by adding detailed information to the predicted results step by step. Through non-parametric aggregation operations, the proposed network can dispose any-shot segmentation tasks without structural adjustments. Finally, extensive experiments on several publicly available benchmark datasets verify the effectiveness of the proposed method. 

2.To address the issue of sample scarcity in few-shot semantic segmentation task, a dual-prototype-based few-shot segmentation method is proposed in this dissertation. Unlike most previous approaches that only focus on extracting information from support samples, the proposed method explores the approach of obtaining information from query samples, thereby improving the utilization rate of the scarce novel class samples. Specifically, the method not only utilizes prototypes extracted from the support samples, but also utilizes the pseudo-prototypes extracted from query samples via a cycle comparison module to guide the segmentation of the query samples. In this method, a cycle comparison module is firstly designed to select reliable foreground features through cycle consistency criterion, and to generate the corresponding pseudo-prototype. Then the correlations between these prototypes and pseudo-prototypes are utilized to enhance each other. In addition, the proposed method also introduces multi-scale context information in the dense matching process of the two prototypes and query features, thereby improving the accuracy of the segmentation results. Finally, The proposed method is extensively evaluated on corresponding benchmark datasets to verify its effectiveness. 

3. Aiming at the few-shot segmentation problem in vedio scenarios, a prototype evolution-based few-shot video object segmentation method is proposed in this dissertation. By building a prototype evolution module to propagate the time information in the query video, this method not only utilizes the correlation between the video frames and the supporting image, but also utilizes the time information contained in the video itself. Specifically, a prototype-based framework is established to build the relationship between support images and target video frames, which has fewer parameters and faster inference speed. Then, a prototype evolution module is built to integrate time information into the evolution process of video prototype features. Additionally, the method proposes an approach to take advantage of high-level features to trade off speed for higher accuracy. Finally, experiments on multiple public related datasets show that the proposed method has certain advantages in terms of accuracy and speed.

Keyword小样本语义分割 语义分割 小样本学习 小样本视频对象分割
Language中文
Sub direction classification图像视频处理与分析
planning direction of the national heavy laboratory视觉信息处理
Paper associated data
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/52231
Collection毕业生_博士学位论文
多模态人工智能系统全国重点实验室_先进时空数据分析与学习
Recommended Citation
GB/T 7714
毛彬杰. 面向小样本场景的语义分割方法研究[D],2023.
Files in This Item:
File Name/Size DocType Version Access License
面向小样本场景的语义分割方法研究.pdf(13534KB)学位论文 限制开放CC BY-NC-SA
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[毛彬杰]'s Articles
Baidu academic
Similar articles in Baidu academic
[毛彬杰]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[毛彬杰]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.