面向特征学习的图像开集识别方法研究

CASIA OpenIR > 毕业生 > 博士学位论文

	面向特征学习的图像开集识别方法研究
	孙珈因
	2024-05-18
页数	148
学位类型	博士
中文摘要	图像识别是计算机视觉领域中一个基本问题。现有方法大多假设测试环境是闭集环境，具有与训练数据集相同的物体类别。然而，许多真实的测试环境是开放环境或称开集环境，具有一些未在训练数据集中出现的新型物体类别（称未知类别），闭集识别方法并不能判定这些未知类别。因此，图像开集识别任务在近几年吸引研究者广泛的关注。开集识别中的一个核心问题是特征学习问题，即如何从仅包含已知类别物体的训练数据集中构建一个特征学习模型，使得学习到的图像特征不但能有效区分已知类别还能判定未知类别。本文围绕着开集识别中的特征学习问题，从特征分布、特征细粒度判别性、特征混淆、特征域四个角度展开探索与研究，主要工作归纳如下： • 图像开集识别任务的样本特征分布往往包含超高斯成分或亚高斯成分，已有方法常用的单高斯模型或混合高斯模型很难对其进行有效建模。针对这一问题，考虑到混合指数幂模型能够有效建模超高斯成分和亚高斯成分，本文提出一种基于自编码混合指数幂分布的开集识别方法。该方法引入一种自编码器将每个已知类别的特征分布建模为隐空间中的混合指数幂分布。相应地，引入一个可微分采样器以保证自编码器在训练过程中能够进行反向传播。最后，利用建模得到的混合指数幂模型参数训练一个线性分类器以实现开集图像识别。在多种国际公用粗粒度图像数据集上的对比结果表明，该方法能够对特征分布进行有效建模，且其开集识别性能优于多种当前主流方法。 • 图像开集识别任务的一些物体类别之间往往只存在细粒度的差异，大部分已有方法往往难以从这些类别的图像中提取出具有强判别性的特征。针对这一问题，受脑识别时序机制启发，本文提出一种基于层级注意力网络的开集识别方法以及一种基于互补变频感知网络的开集识别方法。其中，层级注意力网络时序地整合两种特征——多层注意力网络的层级特征与每一层的上下文特征，以引导注意力网络学习更准确的注意力；互补变频感知网络通过不同频率段的高低频滤波与时序整合，使网络感知到特征频率从任意高频和低频到全频的时序变化过程，以提升网络捕获高低频信息的能力。两种网络利用所学的判别特征训练线性分类器以实现开集图像识别。在多种国际公用粗粒度和细粒度图像数据集上的对比结果表明，两种独立方法以及集成后的方法都能够有效提升特征细粒度判别性，且其开集识别性能优于多种当前主流方法。 • 已有大多数图像开集识别方法所学到的特征不可避免地混淆一些不具判别性的特征成分，降低特征的判别性。针对这一问题，本文提出一种基于递归反事实去混淆模型的开集识别方法。该方法迭代地执行反事实特征学习步骤与特征去混淆步骤。其中，反事实特征表达原特征中潜藏的混淆因子，通过一种可学习策略获得；引入一种负相关约束以进一步去除混淆因子的影响；引入一种递归方式以逐步学习并去除特征中更细微的混淆因子。最后，利用去混淆后的特征训练一个线性分类器以实现开集图像识别。在多种国际公用粗粒度和细粒度图像数据集上的对比结果表明，该方法能够有效削弱特征中混淆因子的影响从而提升特征的判别性，且其开集识别性能优于多种当前主流方法。 • 图像开集识别任务中测试特征域和训练特征域之间存在语义偏移，然而已有绝大多数方法为归纳式方法，难以有效处理特征域偏移。针对这一问题，考虑到直推式方法在原理上能够有效处理域偏移，本文提出一种基于样本筛选与特征生成的直推式开集识别框架，该框架迭代地执行可靠性采样、特征生成和基线更新。首先，引入一种双空间一致性采样方法以筛选伪标签更为可靠的测试样本。之后，引入一种条件双对抗生成网络生成特征以均衡已知类别与未知类别的训练样本数量。最后，根据筛选的样本、生成的特征以及原始训练样本更新基线模型进行开集图像识别。在多种国际公用粗粒度和细粒度图像数据集上的对比结果表明，由该框架衍生出的直推式开集识别方法能够有效缓解特征域偏移，且其开集识别性能优于多种当前主流方法。
英文摘要	Image recognition is a fundamental task in computer vision. Most existing methods assume that the test environment is closed, which has the same classes with the training set. However, lots of real test environments are open sets, i.e., there are some new classes (called unknown classes) that do not exist in the training set, which could not be identified by closed-set recognition methods. Hence, the open-set image recognition task has attracted extensive attention recently. A core issue in open-set image recognition is feature learning, i.e., how to construct a feature learning model from known-class training data, so that the learned features can both effectively classify known classes and identify unknown classes. To address this issue, this dissertation conducts research and exploration from four perspectives: feature distribution, feature fine-grained discriminability, feature confusion, and feature domain. The main contributions of this dissertation are as follows: • The feature distributions in the open-set image recognition task generally contain super-Gaussian or sub-Gaussian components, which couldn’t be effectively modeled by single Gaussian or Gaussian mixtures used in existing methods. To address this problem, considering that the mixture of exponential power distributions can effectively model super-Gaussian and sub-Gaussian components, an open-set recognition method based on auto-encoding the mixture of exponential power distributions is proposed. Specifically, an auto-encoder is introduced, which models the feature distribution of each known class as a mixture of exponential power distributions in the latent space. Accordingly, a differentiable sampler is introduced to ensure that the autoencoder can be trained by gradient backpropagation. Finally, a linear classifier is trained according to the parameters of the modeled mixtures of exponential power distributions for open-set image recognition. Experimental results on multiple public coarse-grained image datasets demonstrate that the proposed method can model the feature distributions effectively, and its open-set recognition performance is better than multiple current popular methods. • Generally, there are only fine-grained differences between different object classes in the open-set image recognition task. Most existing methods are difficult to extract features with strong discriminability from such images. To address this problem, inspired by the temporal mechanisms in brain recognition, an open-set recognition method based on hierarchical attention network and another one based on complementary frequency-varying awareness network are proposed. The former temporally aggregates the hierarchical features from a multi-layer attention network and the contextual features from each layer, so as to guide the attention network to learn more accurate attentions. The latter conducts high-/low-pass filtering on a feature at different frequencies, and then temporally aggregates the decomposed feature components, which enables the network to be aware of the temporally varying process from arbitrary high-/low-frequency feature components to the full-band feature, so as to improve the model ability of capturing high-/low-frequency information. Both networks utilize the learned discriminative features to train linear classifiers for open-set image recognition. Experimental results on multiple public coarse-grained and fine-grained image datasets demonstrate that the proposed methods can effectively improve feature discriminability on fine-grained classes, and their open-set recognition performance is better than multiple current popular methods. • The features learned by most existing open-set image recognition methods inevitably confuse with some non-discriminative feature components, which harms the feature discriminability. To address this problem, an open-set recognition method based on recursive counterfactual deconfounding model is proposed. This method iteratively learns the counterfactual features and conducts feature deconfounding based on the learned counterfactual features. Specifically, counterfactual features express implicit confounders hidden in the original features, which are obtained by a learnable strategy. Besides, a negative correlation constraint is introduced for further alleviating the confounder effects. Additionally, a recursive manner is introduced to gradually learn and alleviate subtler confounders. Finally, the deconfounded features are used to train a linear classifier for open-set image recognition. Experimental results on multiple public coarse-grained and fine-grained image datasets demonstrate that the proposed method can effectively alleviate the effects of the confounders so that improves the feature discriminability, and its open-set recognition performance is better than multiple current popular methods. • There is a semantic shift between the test and training feature domains in the open-set image recognition task. However, most existing methods are inductive methods, which are difficult to effectively handle the feature domain shift. To address this problem, considering that transductive methods can theoretically alleviate the domain shift, a transductive open-set recognition framework based on sample selection and feature generation is proposed, which iteratively performs reliability sampling, feature generation, and baseline update. Firstly, a dual-space consistent sampling approach is introduced for selecting test samples with more reliable pseudo labels. Then, a conditional dual-adversarial generative network is introduced to generate features for balancing known-class and unknown-class training samples. Finally, the baseline model is updated based on the selected samples, generated features, and original training samples for open-set image recognition. Experimental results on multiple public coarse-grained and fine-grained image datasets demonstrate that the transductive open-set recognition methods derived from the proposed framework can effectively alleviate the feature domain shift, and their open-set recognition performance is better than multiple current popular methods.
关键词	开集识别分布建模层级注意力频域滤波反事实去混淆直推式框架
学科领域	模式识别 ; 计算机感知 ; 计算机神经网络
学科门类	工学::控制科学与工程
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/56486
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	孙珈因. 面向特征学习的图像开集识别方法研究[D],2024.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
论文最终版本-签名版.pdf（8220KB）	学位论文		限制开放	CC BY-NC-SA