基于对比表征联邦学习的宫颈癌淋巴结转移预测研究

CASIA OpenIR > 毕业生 > 硕士学位论文

	基于对比表征联邦学习的宫颈癌淋巴结转移预测研究
	刘圣圆
	2024-05
页数	54
学位类型	硕士
中文摘要	宫颈癌是一种女性中常见的恶性肿瘤，淋巴结转移为其主要的转移途径，早期发现和及时治疗淋巴结转移对于提高患者的存活率至关重要。目前，宫颈癌的影像学检查是术前无创评估患者淋巴结状态的常用方法，然而该方法诊断过程费时费力，并依赖于临床医生的诊断经验，容易出现主观判断的误差。近年来，深度学习等技术的发展为宫颈癌淋巴结转移的术前诊断带来了新的方法和思路。在宫颈癌淋巴结转移诊断任务上，虽然已有一些研究表明深度学习技术的有效性，但该方式受到数据安全性和隐私保护的限制，获取大规模数据集仍然存在困难，这些研究多数基于单一中心的本地数据集或集中型的数据集来进行训练和验证。联邦学习作为一种新兴的分布式学习方法，将模型训练过程分布在多个中心上，并在数据保持分离的同时进行模型聚合，逐渐成为解决数据隐私和安全性问题的主流方案。目前，在宫颈癌淋巴结转移预测领域还没有联邦学习的相关研究，同时患者影像数据通常来自不同中心，可能使用不同的影像采集设备和采集参数，如何缓解多中心数据间存在的非独立同分布问题对联邦学习训练造成的影响，也是亟需解决的问题。针对上述问题，本文融合影像组学和深度学习方法，提出了基于多任务学习的宫颈癌淋巴结转移预测网络，实现对宫颈癌计算机断层扫描（CT）影像中显著性特征的有效提取。同时，为解决数据安全和隐私保护问题，本文提出了基于对比表征联邦学习的宫颈癌淋巴结转移预测框架，并在多中心宫颈癌数据集上验证其效果。结果发现，该框架能够提高模型预测准确性，并增强对多中心非独立同分布数据的适应性和鲁棒性。具体来说，本文的研究工作主要包括以下三个部分：（1）本研究首先收集了来自国内八家医院的1033例宫颈癌患者的CT影像数据与临床资料，并由临床医生对宫颈癌肿瘤病灶区域进行标注，构建了一个较大规模的多中心数据集，并通过灰度化、归一化及数据增强等数据预处理方法降低数据中存在的不均衡性和噪声，获得高质量、标准化的数据集。并且，我们分析了该多中心数据集中存在的非独立同分布问题，探讨了该问题对模型性能的影响，为后续研究提供了数据基础。（2）针对宫颈癌淋巴结转移预后预测困难的临床问题和现有模型效果不佳的技术问题，本研究提出了一种基于多任务学习的宫颈癌淋巴结转移预测网络MRCNet。该模型基于U-Net网络结构，通过引入多任务损失函数以全面挖掘淋巴结转移状态和肿瘤病灶ROI区域之间的内在关联。此外，本研究设计了交叉注意力模块融合空间和时间维度特征，增强模型对特征的感知能力，从而更准确地预测淋巴结转移情况。实验结果显示，多任务的MRCNet模型相较于单任务模型准确率提升了4.58%，与人工预定义特征的影像组学模型相比提升了11.28%，验证了该模型在处理淋巴结转移预测任务中的优越性和可行性。（3）针对宫颈癌集中式学习方法存在的数据安全和隐私保护问题以及多中心数据集存在的非独立同分布问题，本研究提出了基于对比表征联邦学习的宫颈癌淋巴结转移预测框架PCRFed。在联邦平均策略的基础上，该框架对模型进行私有层划分。这种个性化学习方法有助于提高模型在不同数据中心间的泛化性能，尤其在处理样本稀缺的数据中心时优势更为显著。同时，提出了一种加权对比表征损失函数，能够根据各中心数据集的大小自适应调整损失函数的权重，从而帮助模型更好地学习数据中心之间的差异性信息，并充分利用全局信息来引导模型的收敛过程。实验结果表明，结合以上两种改进，PCRFed的预测效果（AUC：0.6705）能够优于本地学习策略（AUC：0.6216）和集中学习策略（AUC：0.6690）。这种基于对比表征的联邦学习算法能够更好地适应小数据量、难分类的数据中心，提高了模型在不同数据中心间的泛化性能。综上所述，本文围绕宫颈癌淋巴结转移预测任务，在预测模型和联邦学习框架上分别提出了创新性的优化算法，有效提升了人工智能辅助多中心宫颈癌淋巴结转移的诊断能力。本文提出的方法有助于提高宫颈癌诊断和治疗的准确性，同时为联邦学习在医疗领域的应用提供了新思路。本文相关工作以本人为第一作者发表于医工交叉主流SCI期刊 Medical Physics 和 IEEE International Symposium on Biomedical Imaging 国际会议。
英文摘要	Cervical cancer is one of the common malignant tumors in women, and its main metastasis pathway is lymph node metastasis. Early detection and timely treatment of lymph node metastasis are crucial to improving the survival rate of patients. Currently, imaging examinations are non-invasive methods for preoperative evaluation of lymph node status in cervical cancer. However, this method relies on the diagnostic experience of clinical doctors, and the diagnostic process is time-consuming and laborious, prone to subjective judgment errors. In recent years, the development of technologies such as deep learning has brought new methods and ideas for the preoperative diagnosis of lymph node metastasis in cervical cancer. In the task of diagnosing lymph node metastasis in cervical cancer, although some studies have shown the effectiveness of deep learning technologies, most of these studies are based on local datasets from single centers or centralized datasets for training and validation. This approach is limited by data security and privacy protection, and acquiring large-scale datasets remains difficult. Federated learning, as an emerging distributed learning method, distributes the model training process across multiple centers and aggregates models while keeping data separate, gradually becoming a mainstream solution to address data privacy and security issues. Currently, there is no relevant research on federated learning in the prediction of lymph node metastasis in cervical cancer. Additionally, as patient imaging data usually come from different centers, they may use different imaging acquisition devices and parameters to collect data. How to mitigate the impact of the non-independent and identically distributed data among multiple centers on federated learning training is also a pressing issue that needs to be addressed. To address the above issues, this study integrates radiomics and deep learning methods and proposes a cervical cancer lymph node metastasis prediction network based on multi-task learning, achieving effective extraction of significant features in cervical cancer computed tomography (CT) images. To solve data security and privacy protection issues, this thesis presents a cervical cancer lymph node metastasis prediction framework based on contrastive representation federated learning and validates its effectiveness on a multi-center cervical cancer dataset. The results show that this framework can improve model prediction accuracy and enhance adaptability and robustness to multi-center non-independent identically distributed data. Specifically, the research work of this thesis mainly includes the following three parts: (1) This study first collected CT image data and clinical data of 1033 cervical cancer patients from eight Chinese hospitals and had clinical doctors annotate the tumor lesion areas of cervical cancer, constructing a large-scale multi-center dataset. Through data preprocessing methods such as grayscale conversion, normalization, and data augmentation to reduce the imbalance and noise in the data, a high-quality, standardized dataset was obtained. Moreover, we analyzed the non-independent identically distributed data problem in this multi-center dataset and discussed its impact on model performance, laying a data foundation for subsequent research. (2) In response to the clinical challenge of difficulty in predicting the prognosis of cervical cancer lymph node metastasis and the technical issue of poor performance of existing models, this study proposes a cervical cancer lymph node metastasis prediction network based on multi-task learning called MRCNet. This model is based on the U-Net network structure and introduces a multi-task loss function to comprehensively explore the inherent correlation between the gold standard for lymph node metastasis status and tumor lesion areas. Additionally, this study designs a cross fusion attention module that integrate spatial and temporal features to enhance the model's perception of features, thus more accurately predicting the lymph node metastasis status. Experimental results show that the multi-task MRCNet model improves accuracy by 4.58% compared to single-task models and by 11.28% compared to radiomics models with manually predefined features, effectively enhancing diagnostic accuracy and sensitivity. (3) Addressing the data security and privacy protection issues in centralized learning methods for cervical cancer and the non-independent identically distributed data problem in multi-center datasets, this study proposes a cervical cancer lymph node metastasis prediction framework based on contrastive representation federated learning called PCRFed. Building on the federated averaging strategy, this framework partitions the model into private layers, a personalized learning approach that helps improve the model's generalization performance across different data centers, especially showing more significant performance in dealing with difficult-to-classify and sample-scarce scenarios. Furthermore, a weighted contrastive representation loss function is introduced to adaptively adjust the weights of the loss function based on the sizes of each center's dataset, aiding the model in better learning the differential information among data centers and utilizing global information to guide the model's convergence process. Combining these two improvements, the prediction performance of PCRFed (AUC: 0.6705) surpasses local learning strategies (AUC: 0.6216) and centralized learning strategies (AUC: 0.6690). Experimental results demonstrate that this contrastive representation-based federated learning algorithm can better adapt to small data and difficult-to-classify data centers, enhancing the model's generalization performance across different data centers. In conclusion, this thesis focuses on the task of predicting cervical cancer lymph node metastasis and proposes innovative optimization algorithms for prediction models and federated learning frameworks, effectively enhancing the diagnostic capabilities of artificial intelligence-assisted multi-center cervical cancer lymph node metastasis. The methods proposed in this thesis help improve the accuracy of cervical cancer diagnosis and treatment, while also providing important insights for the application of federated learning in the medical field. The related work of this thesis, with the first author being myself, has been published in mainstream SCI journals, Medical Physics, and presented at the IEEE International Symposium on Biomedical Imaging international conference.
关键词	宫颈癌淋巴结转移影像组学深度学习联邦学习
学科领域	人工智能 ; 模式识别
学科门类	工学::控制科学与工程
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/56507
专题	毕业生_硕士学位论文
推荐引用方式 GB/T 7714	刘圣圆. 基于对比表征联邦学习的宫颈癌淋巴结转移预测研究[D],2024.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
硕士毕业论文_Lsy.pdf（29707KB）	学位论文		限制开放	CC BY-NC-SA