基于视觉构图建模的图像编辑问题研究

	基于视觉构图建模的图像编辑问题研究
	李德榜
	2021-05-27
页数	146
学位类型	博士
中文摘要	本文研究了基于视觉构图建模的图像编辑问题，即通过对图像的视觉构图进行建模和评估，以进行后续的图像编辑操作。具体来说，本文主要关注于图像裁剪和图像子区域推荐这两个具体任务。这两个任务是图像编辑中基本但重要的操作，在摄影、艺术设计、影视处理和印刷业等场景下具有着巨大的应用价值。经过长期的发展，针对图像裁剪和图像子区域推荐的相关算法也取得了一些进展，但相关方法仍然存在着一些明显的不足。首先，如何使用大规模的具有廉价标注的数据或无标注数据对模型进行弱监督或无监督训练以增强模型在不同场景下的泛化能力是一个值得研究的问题。但由于无监督或弱监督场景下缺乏对边界框的标注，大多数相关方法都会采用基于滑动窗口的暴力搜索方式获得最终结果，导致搜索效率十分低下。另外，提高一个模型对于不同需求的自适应性，使其能够为不同需求产生相应的结果也是一个需要解决的问题。同时，一幅图像中不同区域之间的相对关系对于最终的整体评估也是十分重要的，如何利用好这些关系也是一个值得研究的问题。针对上述问题，本文主要开展了以下研究工作：（1）提出了一个基于强化学习和对抗学习的美学图像自动裁剪方法。本工作首先使用大规模的无标注数据根据先验知识构建了大量的训练数据对，并使用这些数据对训练一个美感评估模型。但不同于基于滑动窗口的暴力搜索方式，本工作采用基于强化学习的搜索策略对最优的裁剪区域进行搜索。本工作为搜索过程设计了一个动作空间，其中包含一系列对图像裁剪框的位置、形状和大小进行调整的动作，同时还包含了一个终止动作。本工作将初始裁剪框设置为整张输入图像，并使用上述动作在搜索过程中对裁剪框进行调整，直到终止动作被模型选取为止。并且根据美感评估模型的输出分数计算得到的奖励函数会在训练过程中诱导模型在巨大的搜索空间中找到具有较高美感分数的裁剪区域。同时，本工作在训练过程中还引入了对抗学习，使得基于强化学习的裁剪模型和美感评估模型之间通过相互对抗获得更为优秀的性能。实验结果表明，本工作提出的模型在性能和速度上都要优于基于滑动窗口的相关方法。（2）提出了一个基于元学习的指定形状图像自动裁剪方法。本工作将不同的形状需求视为不同的环境，并通过元学习使得模型能够快速地适应不同的环境。在具体实现中，本工作提出的模型由一个基础模型和两个元学习器（子网络）构成，给定不同的形状需求，元学习器会据此为基础模型预测相应的参数。由于基础模型的参数会随着不同的形状需求而改变，所以模型会根据不同的形状需求为一幅图像预测不同的结果。实验结果表明，本工作提出的模型确实能够根据不同的形状需求为同一张照片预测出不同的结果。（3）提出了一个基于区域间相对关系挖掘的图像子区域推荐方法。上述工作只会为一幅图像或一个具体形状需求预测一个对应的裁剪结果，但一幅图像中具有高构图质量的子区域往往不唯一，且某些应用场景需要模型推荐较多的子区域，所以本文进一步对图像子区域推荐问题进行了研究。为此，本文提出了一个基于关系图的模型来对一幅图像中不同区域间的相对关系进行挖掘，并利用挖掘得到的关系特征帮助模型更好地预测不同子区域的构图质量。实验结果表明，上述关系特征的挖掘过程会显著提升模型的性能。
英文摘要	This dissertation studies image manipulation based on visual composition modeling, that is, modeling and evaluating the visual composition of the image for subsequent image manipulation operations. Specifically, this dissertation focuses on two specific tasks, i.e., image cropping and view recommendation. These two tasks are essential and important in image manipulation, which are valuable in many scenes, such as photography, art design, film processing, and printing. After long-term development, image cropping and view recommendation methods have made significant progress but still have some obvious shortcomings. Using large-scale, cheaply labeled, or unlabeled data to perform weakly-supervised or unsupervised training to enhance the model’s generalization ability in different scenarios is a problem worthy of study. However, due to the lack of bounding box annotations in unsupervised or weakly-supervised scenarios, most methods will use sliding window-based brute force search methods to obtain the final results, resulting in very low search efficiency. Besides, it is also a problem to be solved to improve the adaptability of a model to different needs to generate corresponding results for different needs. Meanwhile, the relative relation between different regions in an image is also significant for the final evaluation. How to make good use of these relations is also a question worthy of research. In response to the above problems, this dissertation mainly carries out the following research works: (1) Aesthetics-aware adversarial reinforcement learning for image cropping. This work first uses large-scale unlabeled data to construct a large number of training pairs based on prior knowledge and uses them to train an aesthetics evaluation network. However, different from the sliding window-based brute force search method, this work uses a reinforcement learning-based search strategy to search for the optimal cropping region. This work designs an action space for the search process, which contains a termination action and a series of actions that adjust the position, shape, and size of the cropping window. This work sets the entire image as the initial cropping window and uses the above actions to adjust the cropping window during the search process until the model selects the termination action. The reward function calculated according to the output score of the aesthetic evaluation network will lead the model to find the cropping window with a high aesthetic score during the training process. At the same time, this work also introduces adversarial learning in the training process so that the reinforcement learning-based cropping model and the aesthetic evaluation network can obtain better capacities through the adversarial process. The experimental results show that the proposed model is superior to the related sliding window-based methods in performance and speed. (2) A meta-learning-based aspect-ratio-specific image cropping method. This work regards different shape requirements as different environments, and the model can quickly adapt to different environments through meta-learning. In the implementation, the model consists of a base model and two meta-learners (sub-networks). Given different shape requirements, the meta-learners will predict the corresponding parameters for the base model. Since the parameters of the base model will change with shape requirements, the model can predict different results according to different shape requirements for an image. Experimental results show that the proposed model can predict different results for the same image according to different shape requirements. (3) A view recommendation method based on mining the relative relations between different regions. The above work can only predict a corresponding cropping result for an image or a specific shape requirement, but the views with high composition quality in an image are often not unique, and some application scenarios require the model to recommend more than one view. Therefore, this dissertation further studies the problem of view recommendation. This dissertation proposes a graph-based model to mine the relative relations between different regions in an image and uses the mined relation features to help the model better predict the composition quality of different views. The experimental results show that the above relation feature mining process will significantly improve the performance of the model.
关键词	视觉构图建模图像编辑视觉美感图像裁剪图像子区域推荐
学科门类	工学::控制科学与工程
语种	中文
七大方向——子方向分类	图像视频处理与分析
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/44361
专题	复杂系统认知与决策实验室_智能系统与工程
通讯作者	李德榜
推荐引用方式 GB/T 7714	李德榜. 基于视觉构图建模的图像编辑问题研究[D]. 中国科学院自动化研究所. 中国科学院大学,2021.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
基于视觉构图建模的图像编辑问题研究.pd（43095KB）	学位论文		开放获取	CC BY-NC-SA