基于对抗学习的多域艺术图像生成

	基于对抗学习的多域艺术图像生成
	林敏轩
	2021-05-29
页数	86
学位类型	硕士
中文摘要	随着计算机和互联网技术的快速发展，图像已经成为了一种主流的信息载体，因此也催生了一系列对于图像生成和编辑等技术的需求。近年来，基于生成式对抗网络的图像生成技术取得了长足进步，对艺术图像生成也产生了许多影响。艺术图像生成旨在把输入的自然图片和指定风格进行融合，生成具有目标风格纹理但却保留自然图片内容的风格化结果，在影音、娱乐和社交等领域均有广泛应用。对于艺术图像，每个画家的作品集合自然形成一个域。但由于域数量庞大，模型训练耗时等原因限制，单独对每个画家域训练生成模型需要消耗较高的成本，因此单模型的多域生成能力成为了一个研究热点。图像翻译是一种处理该类跨域图像生成任务的关键技术，它采用统一的生成模型处理图像到图像的生成任务，在多域艺术图像生成领域中发挥重要作用。本文聚焦多域条件下的艺术图像生成，提出一个基于对抗学习的风格对齐模块和两种多域图像翻译框架。考虑在不同的引导条件下风格信息和内容信息的融合方式，解决了联合条件引导的多域艺术图像生成、多域多模式风格空间构建以及多引导条件下的多域多模式图像风格化等挑战。本文的贡献包括：（1）提出一种联合条件引导的多域图像翻译框架。利用画作的一些特有属性，将画家、年代和流派等属性作为控制变量，对自然图片进行多域风格化。模型采用了非对称的循环生成架构，基于对抗训练来实现多域的艺术图像生成。（2）提出基于对抗学习的风格对齐模块。研究在同时支持样例引导和随机采样引导条件下的风格空间构建问题，使用风格对齐模块代替以往采用相对熵构建风格空间的方式，缓解了可能出现的模式崩溃现象。（3）提出一种分布对齐的多域多模式图像翻译框架。通过两个特征提取分支分别提取风格和内容信息，采用双路对抗架构在风格空间和像素空间分别进行对抗训练，实现了多引导条件下的多域多模式艺术图像生成能力。
英文摘要	With the rapid development of computer and Internet technology, images have become a mainstream information carrier, which has led to a series of requirements for image generation and editing technologies. In recent years, the image generation technology based on the generative adversarial network has made great progress, and it also has a lot of inﬂuence on artistic image generation. Artistic image generation aims to integrate the input natural picture with the specifed style, and generate the stylized result with the target style texture but retaining the content of the natural picture. It is widely used in the felds of audio, video, entertainment, and social communication. For artistic images, the collection of works of each painter naturally forms a domain. However, due to the large number of domains and the time-consuming model training, it is costly to train a generation model for each painter domain separately. Therefore, the multidomain generation capability of a single model has become a research hotspot. Image translation is a key technology for processing this kind of cross-domain image generation tasks. It uses a unifed generation model to process image-to-image generation tasks and plays an important role in the feld of multi-domain artistic image generation. This thesis focuses on artistic image generation under multi-domain conditions and proposes a style alignment module based on adversarial learning and two multi-domain image translation frameworks. Considering the fusion of style information and contentinformation under diﬀerent guidance conditions, this thesis solves the challenges of joint condition-guided multi-domain artistic image generation, multi-domain and multimodal style space construction and multi-domain, and multimodal image stylization under multi-guidance conditions. The contributions of this thesis include: (1) Propose a multi-domain image translation framework guided by joint conditions. The framework uses some unique attributes of paintings such as painter, period, and genre as control variables to stylize natural pictures in multiple domains. The model adopts an asymmetric cycle generation architecture and realizes multi-domain artistic image generation based on adversarial training. (2) Propose a style alignment module based on adversarial learning. Research on the issue of style space construction under the condition of supporting both exemplar guidance and random sampling guidance. The style alignment module is used to replace the previous method of constructing style space with the Kullback-Leibler divergence and alleviates the possible model collapse phenomenon. (3) Propose a multi-domain and multimodal image translation framework based on aligned distribution. The style and content information are extracted separately through two feature extraction branches. The two-way adversarial architecture is used to conduct adversarial training in the style space and the pixel space respectively. Thus, the model can realize the ability of multi-domain and multimodal artistic image generation under multi-guidance conditions.
关键词	图像翻译风格化艺术图像生成多域生成生成式对抗网络
语种	中文
七大方向——子方向分类	图像视频处理与分析
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/45048
专题	多模态人工智能系统全国重点实验室_多媒体计算
推荐引用方式 GB/T 7714	林敏轩. 基于对抗学习的多域艺术图像生成[D]. 中国科学院自动化研究所智能化大厦三层第五会议室. 中国科学院大学,2021.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
基于对抗学习的多域艺术图像生成_林敏轩.（46428KB）	学位论文		开放获取	CC BY-NC-SA