CASIA OpenIR
医学影像数据的弱监督机器学习算法研究
杨萌林
Subtype硕士
Thesis Advisor张文生
2019-05-24
Degree Grantor中国科学院大学
Place of Conferral中国科学院自动化研究所
Degree Name工学硕士
Degree Discipline模式识别与智能系统
Keyword医学影像 弱监督学习 胸部疾病诊断 分类激活图 分类算法
Abstract

弱监督学习是一种高成本效益的机器学习算法,如采用粗粒度的图像类别标签实现目标识别和定位的细粒度任务。尽管在深度神经网络的助力之下,(全)监督学习在计算机视觉的各项任务上得到了突破性的进展,如在图像的识别和理解、目标的检测和定位等深度学习已经达到了或者接近了人的识别水平,但是对于医学影像的处理(如CT,X-ray,MRI等)仍然处于初级阶段。在临床中要实现计算机辅助诊断,与一般的计算机视觉任务不同,我们希望计算机不仅能够对影像中的疾病进行准确的判断,更重要的是给出相应的诊断依据,对诊断过程进行解释。而如果采用全监督学习,就需要对病理区域进行大量的标注。详细的标注信息耗时耗力,更重要的是需要一定水平的专家知识,这是非常难实现的。因此,为了实现诊断的可解释性采用大量细粒度的标注显得很不现实。而医学影像的类别标签却可以很容易获得,甚至可以通过自然语言处理(Natural Language Processing,NLP)的技术处理诊断报告就可以自动获取疾病的标签。 基于此,本文从医学影像诊断的现状和需求出发,研究了医学影像上的弱监督学习,旨在(1)从大量的影像数据中自动识别疾病的类别;(2)通过挖掘病变区域潜在的监督信息来对病变区域进行定位,从而给临床医生提供一些意见和参考。本文对疾病的识别通过图像分类算法实现,对病变区域的定位通过分类激活图来实现。图像分类是通过提取和学习图像中与类别相关的判别信息,对每一种类别输出一定概率的一种算法;分类激活图是在该分类过程中得到的一种具有高层语义信息的特征图。但是在只有类别标签的情况下,分类算法学习到的特征一般具有整体的语义,而得到的分类激活图通常是稀疏的、不连续的、不完整的。因此,本文在医学影像的背景下,主要研究了如何通过弱监督的方式实现疾病的分类,同时挖掘潜在的局部位置信息从而得到较为完整的分类激活图以及如何利用分类激活图对弱监督算法进一步的提升。

本论文的主要工作和贡献如下:

(1) 提出了一种多尺度扩张卷积的深度神经网络(Multiple Dilated Convolution Neural Network,MDCNN)。针对胸部光片中疾病的并发,形状不一,位置不定等问题,MDCNN通过引入多尺度扩张卷积模块(Multi-scale Dilated Convolution,MDC)从不同的尺度去发现更多与疾病相关的区域,防止模型陷入到某一局部极值点,从而获取更多有用的信息实现疾病的定位;该多尺度特征的学习是在分类激活图层面上进行的,在以往的研究中,分类激活图一般是采用全局平均池化(Global Average Pooling, GAP)的池化方式来获得并且需要从网络中提取或者计算出相应的权重间接得到分类激活图,本文跳过该步骤提出了一种能够直接并且可以嵌入到网络的获取方法,进一步实验说明了在其他池化方法的有效性以及理论证明了与原来方法的等价性。本文结合MDC和全局最大池化的方法(GlobalMax Pooling, GMP)设计了端到端的MDCNN,在11万张大型的胸部光片数据集(ChestX-ray14)上进行了大量的实验,通过与多种相关的模型对比,发现提出的模型在胸部疾病诊断上的分类和定位相对于之前的模型有较大的提升。相对于同一类型的基准模型ResNet提升了7.53%到达了0.8204的AUC值,在定位的准确率上也有较大的提升。

(2)提出了一种非局部空间注意机制的残差网络(ResNet-SNA)。针对病变区域和非病变区域的差异性,即两者内部相似度较大,两者之间相似度较小,该网络从非局部特征学习入手,通过构建图模型结构,来计算特征图上像素点之间的相似度,从而挖掘潜在的病理判别区域。ResNet-SNA进一步提升了分类的AUC值,并且在定位上有较大的提升。提出的模型在同样在ChestX-ray14上进行了验证,分类的AUC值达到了0.8247。

(3) 提出了一种结构化分类激活图(CAM based on Structure,Struct-CAM)增强的弱监督学习模型ResNet-CE。在以上两个工作研究的基础上,进一步从利用分类激活图增强分类的角度考虑进行弱监督学习。在只有图像疾病的标签下,通过增强模型的分类性能并利用多尺度的卷积核强制挖掘较多的判别区域,以及通过设计的空间池化方式(Spatial wise pooling, SWP)进行一定的约束,在获得较好的分类效果条件下来得到较好的定位效果。ResNet-CE在胸部光片诊断和自然图片Cifar10、Cifar100以及STL10上进行了实验和验证。相对于之前提出两种方法,在分类和定位上有了进一步去的提升,分类的AUC值达到0.8251。在自然图片中,ResNet-CE和目前主流的模型如VGG、ResNet等进行了比较,有了较为明显的提升。

Other Abstract

Weakly supervised learning is a cost-effective machine learning algorithm. For example, using coarse-grained image-level category labels to achieve the fine-grained take, like object detection and localization. Despite the huge contributions of deep neural networks, the full supervised learning has made breakthroughs in various tasks of computer vision, such as image recognition and understanding, object detection and localization or other related areas, it still stays in the primary stage of medical image processing and understanding (such as CT, X-ray, MRI, etc). To realize the auxiliary diagnosis in clinical, different from the common computer vision tasks, we hope that the computer can not only accurately diagnosis the disease in the image, but more importantly, can give the corresponding diagnosis evidence. However, if using fully-supervised learning, we need to do heavy labelling works that are labor-intensive, time-consuming, and even expertise-dependent, which is hard to accomplish. Therefore, it is unrealistic to make a large number of fine-grained labels to achieve the localization of the diseases. Rather, the image-level labels of medical images can be easily acquired, and even be obtained automatically by natural language processing (NLP) technology. Based on the above facts, this thesis studies the weakly supervised learning for medical images from the perspective of realistic requirements and available data, aiming at (1) automatically identifying the disease category from a large number of image data; (2) locating the lesion area by mining the implicit features of potential discriminative area, in order to provide some suggestions and guidance to the clinician. In this thesis, the diagnosis of diseases is realized by image classification algorithm, and the localization of lesions is by harvesting classification activation map. Image classification is an algorithm to extract discriminative features and produce a certain probability of each category. The classification activation map is the feature map with high-level semantic information which was related to the corresponding category. However, in the case of image-level labels, the features learned by the classification algorithm generally have overall semantics, and the obtained classification activation maps are usually sparse, discontinuous, and incomplete. Therefore, this thesis mainly studies: how to diagnose diseases through weakly supervised annotations and mining potential location information to obtain a more complete classification activation map under the background of medical imaging, and how to use the classification activation map to further enhance the weakly supervised algorithm or classification algorithms.

The main contributions of the thesis:

(1) A multiple dilated convolution neural network (MDCNN) is proposed. In response to the concomitant disease in the chest radiographs, the hugely varying size of local pathological regions, etc., MDCNN introduces a multi-scale dilated convolution (MDC) module to find more disease-related regions from different scales. To prevent the model from falling into local minima, so as to obtain more useful information to achieve the localization of the disease, the learning of the multi-scale feature is carried out at the level of the classification activation map. In the previous studies, the classification activation map is generally obtained by using the global average pooling (GAP) pooling method which needs to extract or calculate the corresponding weights from the network to obtain the classification activation map indirectly. This thesis proposes a method that can be directly embedded into the network and form an end-to-end model. The work further demonstrates the effectiveness of other pooling methods.  combining MDC and Global Max Pooling (GMP), we designed an end-to-end MDCNN model and conducted a large number of experiments on 112,120 large chest light data sets (ChestX-ray14). A comprehensive comparison with related models reveals that the classification and location of the proposed model in the diagnosis of chest diseases are greatly improved. Compared with the benchmark model ResNet, MDCNN has increased by 7.53% and up to 0.8204 AUC score. The accuracy of localization has also been improved greatly.

(2) A residual network with a non-local spatial attention mechanism  (ResNet-SNA)  is proposed. For the difference between the lesion area and the non-lesion area, the internal similarity within each of them is relatively large, and the similarity between them is small. ResNet-SNA is designed from the non-local feature learning. Concretely, the model grabs the similarity of the elements in the feature map by building the probabilistic graphics models, where the similarity is to explore the potential pathological discrimination region. By extensively experiments, ResNet-SNA further improves the AUC scores of classification and has also a large improvement in localization. The proposed model was also verified on ChestX-ray14 with the AUC scores of 0.8247.

(3) A weakly supervised learning model ResNet-CE with CAM based on a customed structure (Struct-CAM) is proposed. On the basis of the above two chapters, we study weakly supervised learning from the perspective of using classification activation maps. Under the image-level labels, ResNet-CE adopts the multi-scale convolution kernel to force mining more discriminant region and aggregates various weighted CAM to enhance the classification performance. By the design of the spatial wise pooling(SWP) as a certain constraint, ResNet-CE Obtains better classification performance and better localization. ResNet-CE was tested and validated on the chest X-ray and the general images (Cifar10, Cifar100 and STL10). Compared with the previous two methods, there is a further improvement in classification and localization, and the AUC scores of classification reach 0.8251. In the general images, ResNet-CE is compared with the current mainstream models such as VGG and ResNet, and there is also a significant improvement.

Subject Area计算机科学技术 ; 人工智能 ; 模式识别
MOST Discipline Catalogue工学::控制科学与工程 ; 工学::计算机科学与技术(可授工学、理学学位)
Pages97
Funding ProjectNational Natural Science Foundation of China[U1636220]
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/23933
Collection中国科学院自动化研究所
精密感知与控制研究中心_人工智能与机器学习
Recommended Citation
GB/T 7714
杨萌林. 医学影像数据的弱监督机器学习算法研究[D]. 中国科学院自动化研究所. 中国科学院大学,2019.
Files in This Item:
File Name/Size DocType Version Access License
Thesis.pdf(7418KB)学位论文 开放获取CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[杨萌林]'s Articles
Baidu academic
Similar articles in Baidu academic
[杨萌林]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[杨萌林]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.