|Place of Conferral||中国科学院自动化研究所|
|Keyword||域自适应学习 迁移学习 分布适配 跨域标签传播 子空间学习|
1. 针对现有分布差异度量在匹配源域和目标域时存在的欠适配问题，提出了一种源域与目标域类簇结构的类别质心匹配 CMMS(class Centroid Matching and local Manifold Self-learning）方法。首先， CMMS 研究了类别质心匹配策略，匹配更为复杂的分布模式，缓解欠适配问题；其次，引入源域判别结构保留策略和目标域局部流形自学习策略来分别提高源域和目标域的类簇质量，提升类别质心匹配的效率；接着，设计了一种迭代优化求解算法，并证明了该算法的收敛性；最后，提出了一种简洁的半监督扩展方案，使得优化求解算法可直接适用于 CMMS 的半监督扩展。在六个公开图像数据集上进行了实验，结果验证了 CMMS 在缓解欠适配问题方面的有效性，以及在无监督和半监督域自适应学习中的优异性。
2. 针对直接使用常规分类器预测目标域样本伪标签时引发的欠拟合问题，提出了一种自适应判别图学习的跨域标签传播 CDGS（Cross-Domain label propa- gation with discriminative Graph Self-learning）方法。首先，构建了一个统一的学习框架，联合优化域不变特征学习、邻接矩阵构建与跨域标签传播，实现邻接矩阵质量提升，从而有效地缓解欠拟合问题；其次，引入了一种自适应判别图学习策略，不仅可以捕捉跨域样本的内在相似性，而且还能充分利用源域样本真实标签和目标域样本伪标签中含有的判别信息；最后，对 CDGS 设计了一种迭代优化求解算法，可直接适用于半监督域自适应学习。在六个公开图像数据集上的实验表明： CDGS 能有效地缓解欠拟合问题，并且在无监督和半监督域自适应学习中均显著优于对比方法。
3. 针对源域和目标域的标签分布失配时出现的负迁移问题，提出了两种标签分布匹配的域自适应学习方法。其一，针对两个域的标签种类一致但同一类别的样本比例差异较大的情况，提出了一种两阶段域自适应学习 TSDA（Two Stage Domain Adaptation）方法。通过边缘分布适配和自适应分类器学习两个阶段，分别适配边缘分布和后验分布，从而实现联合分布适配，缓解负迁移问题；其二，针对源域类别完全覆盖目标域类别的情况，提出了一种共享类别渐进目标域样本学习 PLSC（Progressive target sample Learning of Shared Classes）方法。首先，设计了自适应阈值的共享类识别策略，缓解源域私有类导致的负迁移问题，对于目标域样本，引入自步学习机制，提升子空间学习的质量，促进更准确的共享类识别。在五个公开图像数据集上的实验表明：所提方法能有效地缓解标签分布失配时出现的负迁移问题。
With the rapid development of information technology, machine learning (ML) methods have presented great potential in medical image analysis. The superior performance of most traditional ML methods relies on the basic assumption that training data and test data follow the same distribution. However, in practical scenarios, training data (source domain) and test data (target domain) are often difficult to satisfy the above assumption as a result of the differences of ways and environments for image data collecting. Besides, image data annotating is time-consuming, laborious and costly, which makes it difficult to obtain a large number of labeled samples in practical scenarios. Domain adaptation (DA) takes the internal similarity of source domain and target domain as a bridge, aiming to realize the knowledge transfer and reuse between two domains. DA can improve the generalization ability of ML methods, and can also alleviate the annotation scarcity problem in the era of big data, which has signifcant value in both theory and practice.
In recent years, many DA learning methods for image have been proposed in academia. These methods greatly improve the classifcation performance of target domain, but are still exposed to three urgent problems: under-adaptation, underftting and negative-transfer. Aiming at the three problems, this dissertation studies the corresponding methods and on this basis, selects three typical medical image classifcation tasks, namely pneumonia image classifcation, breast cancer pathological image classifcation and skin cancer image classifcation, to conduct application verifcation. The main research works and innovation achievements of this dissertation are listed as below:
1. For addressing the under-adaptation problem existing when current distribution discrepancy metrics are utilized to align the source domain and target domain, a class centroid matching CMMS (class Centroid Matching and local Manifold Self-learning) method based on the cluster structures of source domain and target domain, is proposed. Firstly, CMMS investigates the class centroid matching strategy, such that the more complicated distribution pattern can be matched and the under-adaptation problem can be alleviated. Secondly, CMMS introduces the source discriminative structure preserving strategy and the target local manifold self-learning strategy to improve the quality of source cluster and target cluster, respectively. Next, an efcient iterative optimization algorithm is designed, and its convergence is proved. Finally, a simple extension scheme for semi-supervised DA is proposed, such that the optimization algorithm can be directly applied to the semi-supervised extension of CMMS. Experiments have been conducted on six public image datasets. The results have verified the effectiveness of CMMS to alleviate the under-adaptation problem, and the superiority in both unsupervised and semi-supervised DA learning.
2. For addressing the underfitting problem triggering when the standard classifer is used to predict the pseudo-labels of target samples, a cross-domain label propagation CDGS (Cross-Domain label propagation with discriminative Graph Self-learning) method based on adaptive discriminative graph learning is proposed. Firstly, a unifed learning framework is established, which integrates domain-invariant feature learning, affinity matrix construction and cross-domain label propagation, such that the quality of the affinity matrix can be improved and the underftting problem can be alleviated. Then, an adaptive discriminative graph learning strategy is introduced, which can not only capture the inherent similarity between cross-domain samples, but also make full use of the discriminative information contained in the ground-truth labels of source samples and the pseudo-labels of target samples. Finally, an effective iterative optimization algorithm is designed, and can be directly applied to semi-supervised DA. Experiments on six public image datasets have shown that CDGS can effectively alleviate the underftting problem, and is signifcantly superior to the comparison methods in both unsupervised and semi-supervised DA learning.
3. For addressing the negative-transfer problem appearing when the label distributions of source domain and target domain are mismatched, two DA methods based on label distribution matching are proposed. One, for the condition that the label types of the two domains are the same but the sample proportions of the same category differ greatly, a Two Stage DA (TSDA) method is proposed. The marginal distribution is aligned by the marginal distribution alignment stage, and the posterior distribution is aligned by the adaptive classifer learning stage. Thus, the joint distribution alignment can be achieved and the negative-trasfer problem can be alleviated. Second, for the condition that the labels of source domain completely cover the labels of target domain, a Progressive target sample Learning of Shared Classes (PLSC) method is proposed. Firstly, a shared classes identifcation strategy based on adaptive threshold is designed, which can alleviate the negative-transfer problem caused by the source-private classes. The self-paced learning mechanism is introduced for target samples to improve the quality of subspace learning, which helps to identify shared classes more accurately. Experiments on five public image datasets have shown that the proposed methods can effectively alleviate the negative-transfer problem appearing when the label distributions are mismatched.
4. This dissertation selects three typical medical image classifcation tasks, namely pneumonia image classifcation, breast cancer pathological image classifcation and skin cancer image classifcation, to conduct application verifcation for the above four methods. For each task, cross-center image dataset is constructed to objectively evaluate the effectiveness of DA methods; the corresponding DA tasks are carefully designed, and appropriate and fair evaluation metrics are selected, which aims to explore the effectiveness of the proposed methods in cross-center medical image datasets; the proposed methods are compared with existing related advanced methods. The experimental results have verified the effectiveness and superiority of the proposed methods in the three typical tasks, and also have shown the signifcant application value of the proposed methods in practical medical image classifcation.
|田磊. 面向图像的域自适应学习方法与医疗应用研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2022.|
|Files in This Item:|
|Recommend this item|
|Export to Endnote|
|Similar articles in Google Scholar|
|Similar articles in Baidu academic|
|Similar articles in Bing Scholar|
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.