图像裁切中的美学与检测问题研究

CASIA OpenIR > 毕业生 > 博士学位论文

	图像裁切中的美学与检测问题研究
	jia geng yun
	2022-05-20
页数	138
学位类型	博士
中文摘要	图像裁切是基本的图像编辑手段之一，具备多种功能。一方面，裁切可以提升构图美感，在多个领域被广泛应用。面向美感增强的自动裁切及其相关问题的研究，可以增加图像裁切的应用广度和深度，促进美学智能的发展，具有重要的应用价值和理论价值。另一方面，裁切能够修改图像含义。这种功能的滥用风险的存在对图像裁切的自动检测技术提出了要求。虽然相关工作在过去十几年中得到了快速发展，但仍然面临很多困难。对于图像裁切中的美学相关研究，其核心挑战来自于美感的主客观融合性和真实场景耦合性，这些特性带来了不同层面的问题，主要有以下三点：第一，多数方法忽视了部分真实场景条件，使得模型和人类所接收和处理的信息之间存在不一致。第二，现有模型对裁切位置和裁切数量进行了理想条件假设，无法适用于部分场景。第三，没有充分利用美学感知过程的主客观不确定性。而对于检测问题，虽然裁切编辑会遗留多种线索，但多数工作对这些线索的区别和关联分析不足，造成多线索的综合利用效率不高。因此，针对上述问题，本文开展了以下研究工作： 1. 提出了一种基于主题引导的全分辨率图像美感评估模型。针对模型和原始图像的美感信息不一致问题，本文提出了图像边缘填充和区域池化相结合的方案，并引入形状特征作为补充。针对主题信息缺失导致的美学标准不一致问题，本文详细分析了主题信息对美感评估的影响，提出了主题引导的美感评估方法，通过注意力机制将主题信息与图像特征融合，从而引入主题标准偏置。实验结果显示，本文提出的方法在美感分布学习，美感分数回归和美感分类三个任务上均取得了突出的效果。 2. 提出了一种面向全局性多样性的图像裁切模型。针对现有模型设定理想条件下的裁切数量先验和位置先验导致应用受限的问题，本文首次提出了全局性和多样性兼顾的裁切模型。在该模型中，一组从多个可学习锚中回归的裁切与真实裁切进行匹配。基于匹配结果训练的有效性分类器从所有回归裁切中挑选有效子集，从而实现了任意数量的裁切回归。同时，为了解决有效性概率与裁切质量不一致的问题，本文引入了质量引导与自蒸馏结合的标签平滑策略。实验分析显示，该模型可以从整个裁切坐标空间产生多个具有高美感质量的裁切结果，并在多个指标上达到了最好水平。 3. 提出了一种基于不确定性建模的图像裁切模型，针对人类美学感知中的不确定性问题，本文分析了裁切图像中存在的两类不确定性并分别建模。对于坐标空间不确定性，模型把候选裁切坐标视为坐标空间中以原给定候选裁切坐标为期望的三角型分布的采样。对于像素空间不确定性，提出在深层神经网络的嵌入空间使用多维高斯分布进行统一建模，并引入了特征分布的序数约束。该约束通过与原图引导特征归一化的结合，有效促进了特征与裁切质量分数的序数一致性。实验结果显示，该模型有效提高了裁切质量评价的效果。 4. 提出了一种基于多尺度特征混合网络的图像裁切检测模型。针对图像裁切遗留的线索多样且尺度差异大的问题，本文提出了一种多尺度特征混合Transformer模型。模型通过卷积模块提取裁切行为遗留在像素细节中的光学线索，通过Transformer网络提取大尺度的摄影构图线索，并引入了图像块位置分类任务以避免像素细节线索遗失。实验表明该模型可以有效检测出裁切图像。
英文摘要	Image cropping is a basic image editing method with many functions. On the one hand, cropping can improve the image composition aesthetic quality, making it widely applied in many areas. The research on image cropping for aesthetic augmentation and related problems can increase the application breadth and depth, helping machines better understand human perceptions. Therefore, there are important applicational and theoretical values. On the other hand, cropping can modify the contents of images. There is an abuse risk of this function, requiring research on automatic image cropping detection. Although related works have developed rapidly in the past ten years, there are still many deficiencies and problems. For the aesthetic problems, the key challenge is that aesthetic is both objective and subjective, and couples with real scenarios. As a result, different problems rise in different aspects. First, Most methods ignore some scene context factors, resulting in information inconsistency between models and humans. Second, most image cropping methods are based on some ideal-condition assumptions of cropping numbers and locations, making them inapplicable in some scenarios. Third, less attention is paid to the objective and subjective uncertainty in human aesthetic perception process. As for the detection problem, although there are many traces left by image cropping, current works cannot sufficiently analyze the differences and relations between them. Thus, the utilization of the diverse traces is not effective nor efficient. In response to the above problems, this thesis carried out the following research works: 1. A theme-aware aesthetic quality assessment model with full-resolution photos is proposed. To deal with the inconsistency between the model-received and human-observed information, this thesis proposes a method that combines image padding and RoM (region of image) pooling. Shape information is also introduced as a complement. For the aesthetic criteria inconsistency, the influence of theme information is analyzed in detail. This thesis proposes a theme-aware aesthetic evaluation method, which effectively integrates the theme information with image features through an attention module. Thus, the theme criterion bias is introduced. Experimental results show that the proposed model achieves outstanding results in three tasks: aesthetic distribution learning, aesthetic score regression, and aesthetic classification. 2. An image cropping model towards both globality and diversity is proposed. Aiming to deal with the problem that existing methods set the crop quantity prior or position prior under ideal conditions, this thesis proposes a model that achieves both globality and diversity for the first time. A set of crops regressed from multiple learnable anchors is matched with the ground-truth crops, and a classifier is trained using the matching results to select a valid subset from all the predictions. Thus, any number of crops can be regressed. Furthermore, two label smoothing strategies are introduced to deal with the inconsistency between validity probability and crop quality, including quality guidance and self-distillation. Experimental analysis shows that the model can produce multiple cropping results with high aesthetic quality from the entire coordinate space and achieves the best level on multiple metrics. 3. An image cropping model based on uncertainty is proposed to address the uncertainty problem in human aesthetic perception. This thesis analyzes the uncertainty in image cropping from two different aspects and models them respectively. For the uncertainty in coordinate space, the model regards the candidate crop coordinates as samples from a triangular distribution whose expectations are the given coordinates in datasets. For the uncertainty in pixel space, it is proposed to use multi-dimensional Gaussian distribution in the embedding space of a deep neural network to integrate various uncertainty factors in pixel space. Besides, this thesis introduces an ordinal constraint on the feature distributions. This constraint effectively promotes ordinal consistency between crop quality scores and features by combining image-guided feature normalization. Experimental results show that this model effectively improves the performance of crop quality assessment. 4. An image cropping detection model based on multi-scale features is proposed. Aiming at the problem of diverse and multi-scale cropping traces, this thesis proposes a multi-scale feature hybrid Transformer image cropping detection model. The model extracts the optical traces left by the cropping behavior in pixel details through a convolution module and extracts the large-scale photographic composition traces through a Transformer network. An auxiliary patch location classification task is introduced to avoid the loss of pixel details. Experiments show that the model can effectively detect cropped images.
关键词	图像裁切美感评估美感增强图像裁切检测
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/48555
专题	毕业生_博士学位论文中国科学院自动化研究所模式识别实验室毕业生
推荐引用方式 GB/T 7714	jia geng yun. 图像裁切中的美学与检测问题研究[D]. 中国科学院大学. 中国科学院大学,2022.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
博士论文提交版.pdf（23957KB）	学位论文		限制开放	CC BY-NC-SA