CASIA OpenIR  > 复杂系统认知与决策实验室  > 先进机器人
白内障显微手术场中的手术操作识别方法研究
陈华斌
2022-05-19
Pages98
Subtype硕士
Abstract

白内障显微手术作为最常见的眼科手术之一,是白内障最主要的治疗方法。 如何提高白内障手术的成功率及患者预后是一项重要的临床研究课题。随着人 口老龄化加剧,白内障患病人数逐年增加,手术医师短缺的问题日益严峻。针对 上述问题,一种有效的解决途径是研发具有自主导航能力的白内障显微手术机 器人。为了实现更智能的机器人辅助手术功能,需要机器人对复杂的手术操作流 程拥有感知理解能力。机器人在手术过程中会产生丰富的多模态数据,影像数据 作为其中最容易获取、成本最低的数据模态,实现基于影像的手术操作识别能够 大幅降低研发成本。本文在国家自然基金重点项目“白内障显微手术机器人影像 处理与自主导航关键问题研究”(U20A20196)等项目的支持下,针对赋予白内 障显微手术机器人视觉智能感知这一目标,结合手术医生实施白内障手术的具 体需求,围绕基于显微影像的白内障手术操作阶段识别、手术器械识别、器械— 阶段二元组识别展开研究。论文的主要内容和创新点如下:

(1) 针对基于影像的白内障手术操作阶段识别任务,提出了一种两阶段的时空因果 Transformer网络。该方法使用不同类型的 Transformer 来分解手术操作的空间和时间维度。首先,空间Transformer使用迁移学习将预训练的视觉Transformer作为空间特征提取器来建模同一时间索引下的空间Token嵌入。与传统的卷积神经网络相比,它能够对图像的全局空间关系进行建模。其次,时序Transformer通过聚合不同时间索引帧的空间Token嵌入,能够学习操作阶段的时序上下文信息。最后,还设计了一种对偶金字塔结构来捕获多尺度的时序上下文信息,缓解操作阶段持续时间变化剧烈的问题。该网络的参数是高效利用的,在两个不同粒度的手术数据集上分别实现了优异的操作阶段识别性能。

(2) 针对基于影像的白内障手术器械识别任务,提出了一种端到端的循环图卷积网络。该方法将空间相关性推理和时序推理相结合。首先,本文构建了手术器械先验图来引入器械共现知识,通过图推理对器械之间的潜在空间相关性进行建模。其次,利用循环神经网络从时间连续的空间特征中捕获时序上下文。最后,结合器械相关语义和时序上下文对器械视觉特征进行校正和聚合。实验结果表明,该方法能有效结合两种推理,缓解器械形变遮挡问题,提高器械识别精度。

(3) 针对基于影像的白内障手术器械—阶段二元组识别任务,提出了一种端到端的快慢路径蒸馏网络。该方法是一种新颖的硬参数共享多任务结构,利用不同时间分辨率的双流网络对器械和操作阶段的时空特征进行分解。首先,通过共享快慢路径编码器为两个任务提供浅层共享特征,使用任务特定层学 习每个任务特定的高级语义特征。其次,为了缓解特征共享冲突,设计了任务特定蒸馏知识迁移模块。通过浅层快慢路径学习深层不同任务分支的高级语义知识,增强快慢路径编码器对不同类型特征的建模能力。最后,利用动态权重平均方法进行多个损失权重的动态调整。实验结果表明,该模型能够在短期时序依赖下实现对白内障手术操作阶段的精确识别。

 

Other Abstract

As one of the most common ophthalmic surgeries, cataract microsurgery is the primary treatment for cataracts. It is an important clinical research subject to improve the success rate of cataract surgery and the prognosis of patients. With an aging population, cataract patients increase every year, and there is an increasing shortage of surgeons. To address above problems, an effective solution is to develop a cataract microsurgical robot with autonomous navigation. A more intelligent robot-assisted surgery requires a perceptual understanding of the complex surgical workflow by the robot. Robots can generate rich multi-modal data during surgery, and video data is the most easily accessible and least expensive data modality. The implementation of video-based surgical operation recognition can significantly reduce research and development costs. Supported by the National Natural Science Foundation of China (U20A20196), this dissertation aims at empowering cataract microsurgery robots with visual intelligent perception. Combined with the specific needs of surgeons performing cataract surgery, the dissertation is focused on microscope video-based phase recognition, surgical instrument recognition, and instrument-phase tuple recognition in cataract surgery. The main contributions and innovations of this dissertation are as follows:

(1) A two-stage spatio-temporal causal Transformer is proposed for the task of video[1]based cataract surgical phase recognition. The method uses different types of Transformer to decompose the spatial and temporal dimensions of the surgical phase. First, with transfer learning, the spatial Transformer uses a pre-trained vision Transformer as a spatial feature extractor, modeling the spatial Tokens under the same temporal index. Compared with traditional convolutional neural networks, it can model the global spatial relationships of images. Second, the temporal Transformer learns the temporal context of the surgical phase by aggregating the spatial Tokens of different temporal indexes. Finally, to alleviate the problem of the large variability in phase duration, a dual pyramid structure is applied for the temporal transformer to capture multi-scale context. The proposed method is parameter efficient and achieves excellent recognition performance of the surgical phase on two surgical activities of different granularity.

(2) An end-to-end recurrent graph convolutional network is proposed for the task of video-based cataract surgery instrument recognition, which combines spatial correlation reasoning and temporal reasoning. First, A prior graph of instruments is built to introduce the knowledge of instrument co-occurrence, and potential spatial correlations between instruments are modeled by graph reasoning. Second, a recurrent neural network is applied to capture the temporal context from temporally continuous spatial features. Finally, the instrument visual features are corrected and aggregated in combination with instrument-related semantics and temporal context. The experiments show that the method can effectively combine two kinds of reasoning, alleviate the problem of instrument deformation and occlusion, and improve the accuracy of instrument recognition.

(3) An end-to-end SlowFast distillation network is proposed for the video-based cataract surgery instrument-phase tuple recognition. The method is a novel multi-task architecture of hard-parameter sharing, which decomposes the spatio-temporal features of the instrument and phase using two-stream networks with different temporal resolutions. First, Shared SlowFast encoders provide shallow shared features for both tasks, and task-specific layers are designed to learn high-level semantic features that belong to each task. Second, to alleviate the feature sharing conflict, a task-specific distillation knowledge transfer module is designed. slow and fast paths of shallow layers can learn high-level semantic knowledge of different task branches at a deeper layer, enhancing the ability of SlowFast encoders to model different types of features. Finally, the dynamic adjustment of multiple loss weights is performed using the dynamic weight averaging method. The experiments show that this model can reach high-accuracy phase recognition of cataract surgery with only short-term temporal dependencies. 

 

Keyword白内障显微手术 机器人辅助手术 手术操作阶段识别 手术器械识别 二元组识别
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/48502
Collection复杂系统认知与决策实验室_先进机器人
毕业生_硕士学位论文
Recommended Citation
GB/T 7714
陈华斌. 白内障显微手术场中的手术操作识别方法研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2022.
Files in This Item:
File Name/Size DocType Version Access License
白内障显微手术场景中的手术操作识别方法研(19218KB)学位论文 开放获取CC BY-NC-SA
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[陈华斌]'s Articles
Baidu academic
Similar articles in Baidu academic
[陈华斌]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[陈华斌]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.