基于对抗学习的图像生成和识别方法研究

CASIA OpenIR > 毕业生 > 博士学位论文

	基于对抗学习的图像生成和识别方法研究
	刘松岩
	2020-05-27
页数	130
学位类型	博士
中文摘要	随着深度学习的不断发展，人工智能进入了新的发展阶段。深度神经网络已经在多个领域，尤其是计算机视觉的多个任务中取得了极大的成功。随着互联网的飞速发展，可以通过网络获得大量的图像数据，然而，现有的基于深度神经网络的计算机视觉方法都极大地依赖有标注的训练数据，但是对这些数据进行人工标注是十分费时费力的。近年来，生成对抗网络（generative adversarial network，GAN）的提出为解决这个问题提供了一个新思路。生成对抗网络包含两个网络——生成器和判别器，通过这两个网络相互对抗从而能够生成足以“以假乱真”的虚拟样本。利用生成对抗网络，可以生成大量能用来训练图像识别任务的样本。此外，使用一个领域判别器，通过对抗学习的方法将有标注的源域数据与无标注的目标域数据提取的特征映射到一个与领域无关的特征表达上，这样可以更好地将在源域上学习到的判别信息迁移到目标域上，提升识别模型在目标域上的性能。本文根据上述两个研究思路，分别开展研究工作，主要研究内容和贡献如下：针对场景文字识别任务面临的训练数据有限且分布不均衡的问题，本文提出了一种基于图像信息解耦的高质量虚拟样本生成方法。本方法将图像信息解耦为内容和风格两部分，其中内容包含明确的语义信息，风格包含其他信息，包括外观等方面的变化。通过学习一个风格编码器，针对特定输入内容，本方法可以生成风格多样的高质量虚拟样本。在训练过程中，本文设计了一种双循环对抗训练策略，通过两次交换输入图像的内容和风格，有效缓解了生成模型的过拟合。实验证明，本方法不仅可以生成能有效提升车牌识别任务性能的训练数据，也可应用于其它图像生成任务，例如不同字体的汉字生成，由边缘图像生成鞋子和背包图像等。针对当前行人检测算法对目标尺度变化和遮挡鲁棒性差以及跨域泛化能力弱的问题，本文提出了一种基于目标迁移的行人检测数据增强方法。具体来说，首先将来自其它数据集的行人目标按照一定规则嵌入到目标场景图片中，生成大量目标尺度变化多样、遮挡情况多变的合成数据。在此基础上，设计一个目标属性保持的生成对抗网络，在保持行人属性尽量不变的前提下，提高生成图片的真实性。最后将引入了尺度和遮挡多样性的生成样本用于增强行人检测的训练数据。实验结果表明，由本方法生成的训练数据在多个行人检测基准方法和数据集上都可以有效地提升检测性能。针对当前面向人体姿态迁移的图像生成方法大都依赖于身份信息构建同一个人不同姿态下的成对训练数据，并且容易背景过拟合的问题，本文提出了一种基于循环一致性的无监督跨身份人体姿态迁移方法。本方法无需身份信息，同时输入任意两个人体图像，通过两次交换其姿态，构建一个循环一致性结构，从而摆脱对成对训练数据的依赖，并且避免了背景过拟合，还有效利用了大量无标签和身份不一致的训练数据。考虑到实际应用中通常需要跨身份的姿态迁移，现有的方法只针对同身份的人体姿态迁移进行了训练，因此本方法的应用场景更加广泛。实验证明，本方法不仅在跨身份姿态迁移中生成的人体图像质量上超过了已有的方法，并且可以作为一种有效的数据增强方法，提升行人检测以及行人重识别任务的性能。针对目标重识别跨域测试效果大幅度下降的问题，本文提出了一种基于特征对抗学习和自相似性聚类的无监督领域迁移方法。前人提出，由于在目标重识别任务中，不同领域目标标签空间不同，因此难以使用领域无关特征学习方法。本文对其提出质疑，第一次将特征对抗学习引入跨域目标重识别问题中，通过对抗学习减小源域和目标域特征分布的距离以将其映射到到一个领域无关的特征表达空间。实验结果表明此方法可以有效提升跨域重识别性能。其次，本文设计了一个自相似性聚类模块，通过挖掘无标签目标域样本的内在关系，生成伪标签，从而和源域数据一起训练。与前人工作不同的是，本方法固定了聚类中心个数，并设计了一种伪标签对齐算法，以适用于更有效的监督损失函数。实验证明，通过将上述两个模块融入一个统一的学习框架，本方法在车辆和行人重识别的无监督跨领域迁移任务中多个公开数据集上都取得了当前最优的性能。
英文摘要	Artificial intelligence has entered a new stage with the development of deep learning. Deep neural networks have made great progress in several fields, especially in some recognition tasks of computer vision. We can obtain abundant image data via the Internet. However, the methods based on deep neural networks rely on labeled data greatly, but labeling these data manually is really expensive and time-consuming. Nowadays, generative adversarial network (GAN) brings a new way to solve the above problem. GAN optimizes the generator and discriminator simultaneously, whose optimization objectives are opposite to each other. Then the generator can generate realistic virtual samples with the adversarial manner. We can generate a large number of training samples by using adversarial learning. Besides, we can map the samples from the labeled source domain and unlabeled target domain to the domain-invariant feature representations by using a domain critic network and adversarial learning, which can help to transfer the discriminative information learning from the labeled source domain to the unlabeled target domain and improve the performance on the target domain. This dissertation focuses on the above two ideas, the specific research contents and contributions are summarized as follows: To solve the lack and imbalance of training data in scene text recognition problem, this dissertation proposes a high-quality virtual sample generation framework based on image information disentangling. In this framework, we disentangle the information of an image into two parts: the content and style, where the content contains specific semantic information while the style contains other information. By learning a style encoder, we can generate high-quality virtual samples with different styles with specific content. In the training procedure, we design a double-cycle adversarial training strategy, which can reduce the over-fitting problem by exchanging the contents and styles of the input twice. The experimental results show that this method can generate virtual samples which can improve the performance of the vehicle license plate recognition. What's more, this method can also be used into other image generation problems, such as generating Chinese characters with different fonts or generating images of shoes and handbags by using edge images. The existing pedestrian detection models have poor robustness on scale and occlusion variations and poor generalization ability on cross-domain testing. To solve that, this dissertation proposes a novel data augmentation scheme by person transferring. Specifically, we embed the pedestrian objects from other datasets into the target scene, which can generate a large number of virtual samples with scale and occlusion variations. Then we design an Attribute Preserving Generative Adversarial Network (APGAN), which can improve the fidelity of the generated images while preserving the attribute of the embedding pedestrians. At last, we use the generated images for data augmentation, which can improve the robustness and generalization ability of the pedestrian detector by introducing the diversity of scale and occlusion into the training set. The experimental results show that this scheme can improve the performance of several pedestrian detection methods and benchmarks. Most existing person pose transfer methods require identity information to build person image pairs with the same identity but different poses. Meanwhile, these methods also cause the problem of background over-fitting. To tackle the above problems, this dissertation proposes an unsupervised cycle-consistent person pose transfer approach. It is trained with unpaired cross-identity person images and can preserve the background information well. Compared with previous methods, the proposed approach achieves better results in cross-identity person pose transfer task and similar results in self-identity one. Moreover, this method can serve as an effective data augmentation scheme for person recognition tasks, which is validated by the experiments on pedestrian re-identification and detection. To solve the problem that the performance of object re-identification (re-ID) may have a severe performance drop in the cross-domain test situation. This dissertation proposes a novel unsupervised domain adaptation framework for object re-ID by fusing feature adversarial learning and self-similarity clustering. Previous work proposes that the domain-invariant feature learning methods are not suitable for re-ID problem because the label spaces of the source and target domain are entirely different. This dissertation doubts the opinion and introduces the feature adversarial learning into re-ID problem firstly. The proposed framework minimizes the discrepancy of feature representations between source and target domains measured by adversarial training and learns domain-invariant feature representations. Experimental results show the effectiveness of this module.Besides, this dissertation designs the self-similarity clustering module to mine the implicit similarity relationships among the unlabeled samples of the target domain. By unsupervised clustering, we can generate pseudo-identity labels for the target domain data, which are then combined with the labeled source data together to train the feature extractor network. This module is different from the existing methods on that we fix the number of cluster centers and present a relabeling algorithm to construct correspondence between two groups of pseudo-identity labels generated by two iterative clusterings. By fusing the above two modules, we set the new state-of-the-art performance for unsupervised domain adaptation on vehicle re-ID and person re-ID benchmarks.
关键词	对抗学习图像生成图像识别数据增强行人检测领域自适应目标重识别深度学习
语种	中文
七大方向——子方向分类	目标检测、跟踪与识别
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/39137
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	刘松岩. 基于对抗学习的图像生成和识别方法研究[D]. 中国科学院大学. 中国科学院大学,2020.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
Thesis.pdf（10368KB）	学位论文		限制开放	CC BY-NC-SA