基于深度学习的小样本肿瘤CT影像分析算法研究

CASIA OpenIR > 毕业生 > 博士学位论文

	基于深度学习的小样本肿瘤CT影像分析算法研究
	王硕
	2019-05-30
页数	134
学位类型	博士
中文摘要	计算机断层扫描（Computed Tomography，CT）作为癌症分析中一种常用的工具，可在宏观层面观察到肿瘤的全貌及其周围组织的微环境信息；是无创伤地观察和分析肿瘤的重要途径。近年来，深度学习的迅速发展极大地改变了现有的图像分析方法。作为一种数据驱动的自学习模型，深度学习方法需要大量的训练数据以获得较好的性能；然而，规范化的肿瘤CT影像数量通常较小，这导致了深度学习在肿瘤的CT影像分析中面临小样本量带来的问题。本文针对这一问题从三个方面提出了新的方法：1）将病人级别的肿瘤分析问题转化为体素级分类问题，进而扩充有标签数据的样本量；2）使用迁移学习方法优化卷积神经网络的训练方式，减少小样本带来的过拟合问题；3）提出半监督学习框架，有效利用无标签数据进行网络训练。本文的主要工作及贡献如下： 1、提出了基于MV-CNN和CF-CNN的体素级分类算法。针对病人级别的肿瘤CT影像数据量少的问题，首先，将病人级别的肿瘤分析问题转化为体素级别的分类问题，将肿瘤中的每一个体素点当做一个训练样本，进行样本量扩充；然后，针对体素级分类问题的特点，提出了MV-CNN和CF-CNN网络。MV-CNN可从CT影像的三个正交的视角中提取多尺度的体素图像信息。在CF-CNN中，本文提出了中心池化层，根据图像的空间位置进行自适应的非均匀池化；并使用双分支结构融合多尺度二维信息和三维信息。在训练样本的选取上，本文提出了基于分类难度的加权采样策略。在肺肿瘤分割问题中，本文提出的MV-CNN在LIDC数据集上Dice系数为77.67%，比传统的分割方法提升8%；本文提出的CF-CNN在LIDC和GDGH两个数据集中Dice系数均大于80%，比传统的图像分割方法提升大于6%，比其他已发表的深度学习模型如U-Net提升大于3%。 2、提出了面向肿瘤CT影像的迁移学习方法。针对小样本数据易导致卷积神经网络训练时陷入局部极值点的问题，首先，使用在自然图像中已经预训练好的网络的一部分权重初始化新设计的网络，然后，通过两阶段的训练方式重新在目标数据集上训练深层卷积神经网络。基于该方法，本文实现了肺癌的EGFR基因突变预测。在由241例肺癌患者组成的独立测试集中，该方法的AUC=0.81，比目前该任务中已发表的算法性能提升17%。此外，通过卷积神经网络的可视化算法，该模型可发现肿瘤中EGFR基因突变可疑度高的区域，辅助临床上组织活检时穿刺位点的选取。 3、提出了面向肿瘤CT影像的半监督学习框架。针对肿瘤的预后分析任务中有标签的数据量少的问题，使用自编码器从相对大量的无标签数据中学习特征，然后结合少量的有标签数据来提升模型的预测性能。基于此半监督学习框架，本文设计了RCAE和DenseCAE两种网络结构，并将其应用于肺癌的总生存期预测和卵巢癌的复发时间预测。在预测肺癌患者的总生存期时，该方法的C-Index=0.710，比已发表的文章中的算法性能提升8%；在预测卵巢癌患者的复发时间时，该方法的C-Index为0.713。进一步的Kaplan-Meier分析、log-rank检验、校正曲线分析均证明了该方法的有效性。本文围绕规范化的肿瘤CT影像数据量小的问题，从扩充有标签数据、改变网络训练方式、利用无标签数据这三个方面开展了研究，提出了适用于不同场景的深度学习模型，为肺肿瘤分割、肺癌EGFR基因突变预测、肺癌总生存期预测、卵巢癌复发时间预测这些临床肿瘤分析任务提供了有效的解决方案。
英文摘要	Computed tomography (CT) as a commonly used tool in cancer analysis, can visualize the complete appearances of tumors and their surrounding tissues at a macroscopic level. Consequently, CT imaging provides an important method for observing and analyzing tumors noninvasively. In recent years, the rapid development of deep learning has greatly changed the existing image analysis methods. As a data-driven and self-learning model, deep learning needs large amount of training data to achieve a good performance. However, standardized CT images of tumors are very limited, which limites the performance of deep learning in CT image analysis of tumors. To solve this problem, this dissertation proposed new deep learning methods from the following three perspectives: 1) Convert patient-level tumor analysis tasks into voxel-level classification tasks, aiming at enlarging the labelled data amount. 2) Use transfer learning to optimize the training process of deep learning, aiming at avoiding over fitting caused by small training data. 3) Propose a semi-supervised deep learning framework, aiming at using large amount of unlabelled data for network training. The main innovations and contributions of our study are as follows: 1. This dissertation proposed voxel-level classification algorithms based on MV-CNN and CF-CNN. Since standardized tumor CT images are limited in patient-level, this dissertation converted the patient-level tumor analysis task into a voxel-level classification task. Each voxel in the tumor was treated as a training sample, which greatly enlarged the data amount. Afterwards, this dissertation proposed the MV-CNN and CF-CNN networks that were designed specifically for voxel-level classification tasks. The MV-CNN can extract multi-scale information from three orthogonal perspectives of CT images. In the CF-CNN, this dissertation proposed a central pooling layer, which performd adaptive non-uniform pooling according to the spatial location of the image voxels. In addition, the CF-CNN combined multi-scale two-dimensional information and three-dimensional information through two CNN branches. When choosing training samples, this dissertation proposed a weighted training sample selection strategy according to the difficulty of classification for each voxel. In lung tumor segmentation task, the MV-CNN achieved Dice coefficient of 77.67% in the LIDC dataset, which was 8% higher than the traditional segmentation methods. In the two datasets LIDC and GDGH, the CF-CNN achieved Dice coefficient of over 80%, which was 6% higher than traditional segmentation methods, and 3% higher than other deep learning models such as U-Net. 2. This dissertation proposed a transfer learning method for CT image analysis of tumor. To solve the over fitting of deep learning caused by limited training data, this dissertation initialized part of the newly designed CNN layers using another network that has been pre-trained in natural images. Then, this dissertation used a two-stage training approach to finetune the newly designed CNN on the target dataset. Based on this method, this dissertation achieved predicting the EGFR mutation status of lung cancer through CT images. In an independent testing set consisting of 241 lung cancer patients, this method had an AUC of 0.81, which was 17% higher than the commonly used methods in this task. In addition, through the visualization algorithm of CNN, this model could find the suspicious region in tumor that could probably occur EGFR mutation. This suspicious region can assist clinicians choosing the biopsy location. 3. This dissertation proposed a semi-supervised learning framework. To train the deep learning model with very limited data for tumor prognostic analysis, this dissertation used auto-encoder to learn features from a relatively larger unlabelled dataset, and then combined small amount of labelled dataset to improve the predictive performance of the deep learning model. Based on this semi-supervised learning framework, this dissertation designd two networks: RCAE and DenseCAE, and applied them to the predict the overall survival of lung cancer and the recurrence of ovarian cancer. When predicting the overall survival of lung cancer, this method achieved C-Index=0.710, which was 8% higher than the published method. When predicting the recurrence of ovarian cancer, this method achieved C-Index=0.713. Further Kaplan-Meier analysis, log-rank test, and calibration curve analysis also demonstrated the effectiveness of this method. Focusing on solving the problems caused by the small amount of standardized CT images of tumors, this dissertation proposed three methods including expanding the labelled dataset, changing the network training process, and using unlabelled data, and proposed corresponding deep learning models that were suitable for different scenes. These methods provide effective solutions for clinical tasks, such as tumor segmentation, predicting EGFR mutation status in lung cancer, predicting overall survival of lung cancer, and predicting recurrence of ovarian cancer.
关键词	计算机断层扫描（ct）深度学习肿瘤分割半监督学习预后分析
语种	中文
七大方向——子方向分类	医学影像处理与分析
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/24000
专题	毕业生_博士学位论文复杂系统管理与控制国家重点实验室
推荐引用方式 GB/T 7714	王硕. 基于深度学习的小样本肿瘤CT影像分析算法研究[D]. 中国科学院自动化研究所. 中国科学院大学,2019.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
博士论文-王硕vr20-完整.pdf（6465KB）	学位论文		限制开放	CC BY-NC-SA