面向微创手术的内窥镜影像关键区域分割方法研究

CASIA OpenIR > 毕业生 > 博士学位论文

	面向微创手术的内窥镜影像关键区域分割方法研究
	倪振梁
	2022-05-26
页数	123
学位类型	博士
中文摘要	微创手术具有创口小、感染率低、术后恢复快等优点，具有广阔的发展前景。目前，微创手术机器人被广泛地应用于微创手术中。微创手术机器人显著地改善了医生的工作环境，提升了手术的安全性。但是，机器人的应用增加了医生的学习成本。微创手术机器人导航系统的开发有助于减少医生的学习周期，提升手术的安全性。微创手术机器人导航的核心难点在于机器人需要拥有像手术医生般的视觉感知能力。具体来说，微创手术机器人需要能够从内窥镜图像中获取关键物体的位置信息。语义分割方法可以帮助机器人识别并定位物体的边界。因此，本文针对内窥镜图像中的关键区域分割问题展开研究，旨在为微创手术机器人导航提供算法支持。论文的主要内容和创新点如下：（1）针对内窥镜图像手术器械分割任务中的反光和阴影等光照问题，本文提出了金字塔注意力聚合网络，其通过聚合多尺度的注意力特征来改善特征表示。该网络包括双注意力模块和金字塔上采样模块两个创新模块。双注意力模块捕获位置和通道之间的语义关系用于推断光照干扰区域的语义信息，从而提升网络对反光和阴影中的手术器械的分割精度。此外，金字塔上采样模块聚合多尺度注意力特征来捕获局部细节和全局信息，从而进一步改善网络的特征表示。实验结果表明，金字塔注意力聚合网络在内窥镜手术器械分割数据集EndoVis 2017上取得了优异的分割性能。（2）针对内窥镜图像手术器械分割任务中的尺度变化问题以及细小器械分割问题，本文提出了自适应感受野网络，其可以为不同尺度的手术器械自适应地选择合适的感受野。在该网络中，自适应感受野模块被提出用于选择具有合适感受野的特征以适应手术器械的尺度变化。该模块将多尺度的特征作为输入，并利用通道间的关系进行多尺度特征的筛选。自适应感受野模块可以细化多尺度特征，覆盖更多的尺度范围，这也有助于细小手术器械的分割。此外，双线性注意力模块被设计用于捕捉通道间的语义关系使网络更加关注关键区域。该模块利用了通道关系来增强网络对手术器械的识别，而不是仅仅根据局部的颜色和纹理特征来识别手术器械，因此其不会受到反光和阴影的干扰，有助于解决光照问题。并且，自适应感受野网络采用膨胀残差网络来保留更多细节特征，从而提升对细小手术器械的分割精度。实验结果表明，自适应感受野网络在内窥镜手术器械分割数据集EndoVis 2018上取得了优异的分割性能。（3）针对内窥镜图像软组织分割任务中的局部特征相似问题，本文提出了空间关系推理网络，其通过捕捉远距离语义关系来提升对局部特征的识别能力。该网络包括空间压缩推理模块和低秩双线性融合模块两个创新模块。空间压缩推理模块首先在垂直和水平方向压缩特征图，捕捉特定方向上的远距离语义信息。然后，计算每个水平位置和垂直位置之间的相似度，建立空间关系矩阵。最后建模通道间的关系以指导空间关系矩阵自适应地分布到原始特征空间。这样，空间压缩推理模块可以捕捉远距离语义关系，解决局部特征相似问题。此外，低秩双线性融合模块基于低秩双线性模型来融合浅层特征和深层特征，其可以提取细粒度的特征，增强不同语义特征之间的区别。实验结果表明，空间关系推理网络在软组织分割任务上取得了优异的分割性能。（4）为了将分割模型在移动式的微创手术机器人进行部署，本文提出了类内自蒸馏算法用于模型压缩。类内自蒸馏算法被设置在分割模型的解码器部分，并采用多阶段的自上而下的方式进行知识传递。蒸馏特征的通道数被设置为与分割类别的总数一致，每个通道可以反映特定类别的特征分布。因此，可以通过监督特定的通道特征来学习特定类别的类内特征分布，而不会受到其他类的特征分布的干扰。这样，学生特征图可以更好地学习教师特征图的特征分布，改善网络的特征表示。基于类内自蒸馏算法，金字塔注意力聚合网络和空间关系推理网络被进行了压缩。实验结果表明，类内自蒸馏算法可以在显著减少模型计算复杂度和模型参数量的情况下提升模型的分割精度。
英文摘要	Minimally invasive surgery has the advantages of the small incision, low infection rate, and quick postoperative recovery. It has broad prospects for development. At present, minimally invasive surgical robots are widely used in minimally invasive surgery. Minimally invasive surgical robots have significantly improved the working environment of doctors and improved the safety of surgery. However, the application of robots increases the learning cost for doctors. The development of the minimally invasive surgical robot navigation system can help reduce the doctor's learning time and improve the safety of surgery. The core difficulty of minimally invasive surgical robot navigation is that the robot needs to have the visual perception ability like a surgeon. Specifically, minimally invasive surgical robots need to be able to obtain position information of key objects from endoscopic images. Semantic segmentation methods can help robots identify and locate the boundaries of objects. Therefore, this paper focuses on the segmentation of key regions in endoscopic images, aiming to provide algorithm support for the navigation of minimally invasive surgical robots. The main contents and innovations of the paper are as follows: (1) Aiming at illumination problems such as reflections and shadows in the task of surgical instrument segmentation in endoscopic images, this paper proposes a pyramid attention aggregation network, which improves feature representation by aggregating multi-scale attention features. The network includes two innovative modules: the double attention module and the pyramid upsampling module. The double attention module captures the semantic relationship between locations and channels for inferring semantic information in light-disturbed regions, thereby improving the segmentation accuracy of surgical instruments in reflections and shadows. Furthermore, the pyramid upsampling module aggregates multi-scale attention features to capture local details and global information, which further improves the feature representation of the network. Pyramid attention aggregation network achieves state-of-the-art performance on the endoscopic surgical instrument segmentation dataset EndoVis 2017. (2) Aiming at the problem of scale variation and small instrument segmentation in the task of endoscopic surgical instrument segmentation, this paper proposes an adaptive receptive field network, which can adaptively select appropriate receptive fields for surgical instruments of different scales. In this network, the adaptive receptive field module is proposed to select features with suitable receptive fields to adapt to the scale variation of surgical instruments. It takes multi-scale features as input and uses the relationship between channels to filter multi-scale features. The adaptive receptive field module can refine multi-scale features and cover more scale ranges, which is also helpful for the segmentation of small surgical instruments. Besides, the bilinear attention module is designed to capture the semantic relationship between channels, making the network pay more attention to key regions. This module uses the channel relationship to enhance the recognition of surgical instruments by the network, instead of identifying surgical instruments only based on local color and texture features. Therefore, it is not disturbed by reflections and shadows, which helps to solve illumination problems. Moreover, the adaptive receptive field network adopts the dilated residual network to retain more detailed features, improving the segmentation accuracy of small surgical instruments. Experimental results show that the adaptive receptive field network achieves state-of-the-art performance on the endoscopic surgical instrument segmentation dataset EndoVis 2018. (3) Aiming at the local feature similarity issue in the soft tissue segmentation task of endoscopic images, this paper proposes a spatial relation reasoning network, which captures long-range semantic relations to improve the ability to identify local features. It contains two innovative modules: the space squeeze reasoning module and the low-rank bilinear fusion module. The space squeeze reasoning module first squeezes the feature maps in the vertical and horizontal directions, capturing long-range semantic information in specific directions. Then, the similarity between each horizontal position and vertical position is calculated to establish a spatial relationship matrix. Finally, the relationship between channels is modeled to guide the adaptive distribution of the spatial relationship matrix to the original feature space. In this way, the space squeeze reasoning module can capture long-range semantic relations and address the local feature similarity issue. Besides, the low-rank bilinear fusion module adopts the low-rank bilinear model to fuse shallow features and deep features, which can extract fine-grained features and enhance the distinction between different semantic features. Experimental results show that the spatial relation reasoning network achieves excellent segmentation performance on soft tissue segmentation tasks. (4) To deploy the segmentation model on the mobile minimally invasive surgical robot, this paper proposes the class-wise self-distillation algorithm for model compression. The class-wise self-distillation algorithm is set in the decoder of the segmentation model and adopts a multi-stage top-down approach for knowledge transfer. The channel numbers of the distilled features are set to be consistent with the total number of segmentation classes, and each channel can reflect the feature distribution of a specific class. Therefore, the intra-class feature distribution of each class can be learned by supervising specific channel features without being disturbed by the feature distributions of other classes. In this way, the student feature map can better learn the feature distribution of the teacher feature map and improve the feature representation of the network. Based on the class-wise self-distillation algorithm, the pyramid attention aggregation network and the spatial relation reasoning network are compressed. The experimental results show that the class-wise self-distillation algorithm can improve the segmentation accuracy of the model while significantly reducing the computational complexity and size of the model.
关键词	内窥镜，图像分割，自蒸馏算法
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/48658
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	倪振梁. 面向微创手术的内窥镜影像关键区域分割方法研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2022.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
倪振梁博士学位论文_0611.pdf（47645KB）	学位论文		限制开放	CC BY-NC-SA