图像生成对抗模型与应用研究

CASIA OpenIR > 多模态人工智能系统全国重点实验室 > 人工智能与机器学习（杨雪冰）-技术团队

	图像生成对抗模型与应用研究
	张晨阳
	2021-05-24
页数	120
学位类型	博士
中文摘要	图像作为对客观世界的一种生动描述，是人类最主要的信息来源。随着人工智能与机器学习技术的迅速发展，基于图像数据的智能服务已被成功应用于社会、经济、民生等诸多领域。现有准确率高、泛化性能优异的机器学习模型多依赖于大量有标注的训练样本，然而，现实场景的图像数据通常呈现出分布复杂、类别高度不平衡、标注严重缺乏等特性，通过人为采集数据和手工标注无法满足海量数据的分析需求。在此背景下，生成对抗网络应运而生，其借鉴博弈论中的``零和博弈''思想，即双方的利益之和为零或常数，博弈的双方分别是生成模型和判别模型。通过迭代的对抗训练，生成模型有效拟合真实图像的分布产生新的图像样本，使得判别模型无法判断样本的来源，达到一种纳什均衡。本文旨在从生成对抗模型理论出发，研究图像对抗学习机制，以气象雷达回波降水和智能安防行人重识别为应用背景，聚焦生成对抗网络中的图像序列生成、图像风格转换和无监督领域自适应三个典型任务，探索高质量有标注图像的自动生成、准确高效地探索数据分布结构等理论和应用问题。本文的创新性研究成果如下: 1. 提出一种两阶段雷达图像序列生成对抗模型（Two-stage radar image sequences generative adversarial network, TsGAN）。首先，模型针对雷达回波图像序列存在帧帧间隔较长、内容突变特性而导致生成模型性能不佳问题，挖掘雷达图像序列中成对的内容信息和动态信息，迭代对抗训练提升生成图像序列的质量；其次，模型针对雷达回波图像帧与帧间的连续性问题，提出动态增强生成模块，通过对动态信息解耦，进一步强化生成图像序列中的动态特征；最后，模型针对两阶段对抗训练的判别过程，提出格拉姆矩阵（Gram matrix）三通道序列判别器，计算真实雷达图像和两阶段生成序列的排序损失，生成在动态和内容层面兼顾的雷达回波图像序列。在中国深圳气象局的降水数据集上进行模型的实验验证，结果表明：所提模型的生成性能显著优于目前国际上的先进方法，所提模型已部署于中国气象局短临预报与订正系统，经受了实际工程应用考验。 2. 提出一种摄像机风格生成的两重对齐模型（Dual-alignment learning with camera-aware image generation, DAL）。首先，模型针对无监督领域自适应行人重识别中的摄像机风格差异性问题，构建摄像机转换匹配模块，生成风格转换训练样本，并提供额外的监督信息并增加信息密度，以匹配对形式在降低额外计算开销的同时捕获摄像机风格信息；其次，模型针对目标域行人图像的判别性表示，构建匹配对伪标签分布对齐模型，通过高效的伪标签分配机制使得模型学习到摄像机判别性不变特征；最后，构建匹配对表示对齐模型，通过最大化互信息提升匹配对图像特征相似性表示，加强匹配对内图像的关联，从而有效缓解摄像机差异性问题。在Market1501、DukeMTMC-reID和MSMT17三个大型公开基准数据集上进行模型的实验验证，结果表明：所提模型的识别性能显著优于国际先进方法，依靠摄像机风格转换和两重对齐机制可以有效地降低跨视角下的样本分布差异，并减少额外的计算开销，显著提升行人重识别的识别准确率。 3. 提出一种类中心判别对抗的领域对齐模型（Centroid discriminative adversarial domain alignment, CDA-DA）。首先，模型针对无监督领域自适应行人重识别中的领域分布差异问题，提出类中心判别性学习模块，统一优化类中心损失和聚类损失，学习源域和目标域的判别性表示；其次，模型针对源域和目标域行人图像特征不重叠问题，提出类中心不变性对抗学习模块，通过行人图像一阶矩的对抗对齐为行人重识别中的领域自适应增加约束，缓解数据集之间存在的领域分布差异；最后，模型针对无监督聚类过程中存在的难样本现象，提出动态分配聚类算法，自适应地选择可靠的行人图像进行细粒度身份聚类。在行人重识别的三个主流国际公开数据集MSMT17、DukeMTMC-reID和Market1501上进行模型的实验验证，结果表明：类中心判别对抗的领域对齐模型能有效地缓解领域分布差异，显著提升行人重识别特征判别能力。
英文摘要	As the most commonly digital information, image has been spread in all aspects of people's social life. In recent years, with the rapid development of artificial intelligence and machine learning technologies, image data has been widely used in intelligent services such as society, economy, and people's livelihood. Existing machine learning models with high accuracy and good generalization performance rely on large amounts of labeled training samples. However, image data from the natural scene is usually with complex distribution and highly imbalanced categories, and severely lacks supervised information. Manual data collection and labeling cannot be adapted to the requirement of applications. Nevertheless, the generative adversarial network (GAN) brings a new vitality to artificial intelligence. The GAN's idea comes from the ``zero-sum game'' problem in game theory, that is, the sum of the interests of both parties is zero or a constant. The two sides of the game in the model are the generator and the discriminator. Through iterative adversarial training, the generator can effectively fit real images' distribution and generate new image samples. Simultaneously, the discriminator cannot distinguish the source of the samples. In the end, the GAN model achieves Nash equilibrium. This dissertation derives from GAN theory, regarding weather radar precipitation and person re-identification of surveillance monitoring as the applications. The dissertation conducts research on three typical tasks including image sequence generation, image style translation, and unsupervised domain adaptation with GAN. We attempt to automatically generate labeled high-quality images and further explore the data distribution structure efficiently and effectively. The main contributions can be summarized as follows: 1. A two-stage radar image sequences generative adversarial network (TsGAN) is proposed to generate radar reflectivity image sequences. Considering the relatively long interval and content mutation between continuous radar images due to radar volume scan, the model attempts to mine the paired content and motion features in radar image sequences, and implements iterative adversarial training to improve the quality of generated image sequences. On this basis, by decoupling the dynamic information, a sequence enhanced generation module is proposed to describe the relationship between adjacent frames with long interval. To discriminate the image sequences results from raw data, stage I and stage II, a three-channel gram matrix discriminator is proposed to optimize a ranking loss. Experiments are implemented on the radar reflectivity data set in Shenzhen, China. The results confirm that TsGAN can achieve superior performance for both content generation and motion generation, showing promising quantitative results and visually qualitative results in comparison with other competitors. The proposed method has been deployed in the precipitation nowcasting and correction system of the China Meteorological Administration. 2. A dual-alignment learning framework (DAL) with camera-aware image generation is proposed to decrease the camera variance and enhance the discrimination ability of person re-identification (re-ID) model. Specifically, we propose a camera transfer matching module to generate additional training images with different camera styles. In this manner, the camera style information can be captured efficiently to significantly reduce the extra computational cost. After that, the pseudo-label distribution alignment module is proposed to enforce the discrimination ability of re-ID model in target domain. Besides, to further encourage the camera-invariance characteristics, we align the image feature representations of matching pair via maximizing their mutual information. We conduct extensive experiments and ablation studies on three large-scale benchmarks including Market1501, DukeMTMC-reID and MSMT17, to manifest the competitive performance of the proposed method. The DAL framework allows it to decrease the camera variance to efficiently and effectively improve transferable ability, and enforces discrimination ability of the re-ID model in target domain. 3. A centroid discriminative adversarial domain alignment (CDA-DA) is proposed to decrease the domain gap between source and target domains in unsupervised person re-ID. The basic idea is simple that we perform unsupervised clustering on target unlabeled dataset while conducting supervised classification on source labeled dataset. To achieve this goal, a novel centroid discriminative learning module, which uniformly formulates the center loss and clustering loss, is proposed to simultaneously learn source and target discriminative representation. We introduce adversarial learning to align the first-order moment of samples, providing a relaxed constraint for re-ID domain adaption. For the hard-samples in re-ID training process, we propose a dynamic assignment mechanism that adaptively selects reliable samples for clustering fine-grained identities. Experiments on three public re-ID datasets show that the CDA-DA model can effectively decrease the domain distribution gap between source and target domains, and further enforce discrimination ability of re-ID model.
关键词	生成对抗网络深度学习图像序列生成领域自适应行人重识别
语种	中文
七大方向——子方向分类	机器学习
国重实验室规划方向分类	其他
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/44944
专题	多模态人工智能系统全国重点实验室_人工智能与机器学习（杨雪冰）-技术团队
推荐引用方式 GB/T 7714	张晨阳. 图像生成对抗模型与应用研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2021.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
132674595571865000.p（8500KB）	学位论文		开放获取	CC BY-NC-SA