图像美学质量评估的方法与应用

CASIA OpenIR > 多模态人工智能系统全国重点实验室 > 模式分析与学习

	图像美学质量评估的方法与应用
	盛柯恺
	2019-05-27
页数	140
学位类型	博士
中文摘要	视觉美感质量评估是计算机视觉领域中非常具有挑战性的问题之一。从技术上讲，视觉美感质量评估是一项计算机视觉认知任务，涉及到多个学科，具有重要的理论意义和实际价值。视觉美感质量评估的最终目标是希望计算机能够像人类一样对图像的美感质量进行感知、分析和决策，在图像检索、图像合成、图像处理、机器情感智能等很多领域也有着广泛的应用前景。在过去的十几年里，美感质量评估研究得到了快速的发展。但另一方面，由于图像的美感质量是一个偏主观的视觉属性，至今仍然是一个具有较大挑战性和开放性的研究热点。本文主要研究图像美学质量的评估方法。针对目前主流的图像美学质量评估方法中存在的图像级别美学标签的数据利用效率低、尚未有效地利用无标签的图像数据、缺乏合理的正则化策略等问题，我们结合机器学习、计算机视觉领域的最新进展，提出新的、有效的方法。基于实验结果，我们也得到了很多有实际意义和参考价值的知识。照顾到实际应用的需求，本文还将涉及食物图像、人脸图像这样两个特定领域的美学质量评估问题。具体的，本文主要的研究工作和贡献如下: 1. 提出一种基于注意力机制的图像美学评估方法。由于图像级别美学标签的信息量不足、利用率低，目前主流的方法需要借助相关的图像属性(如，物体语义，场景语义，图像颜色属性等)来辅助完成美感质量的评估。但是，这些属性依赖于专家设计，不够灵活，而且标注成本和试错代价都不低。为了在不使用额外图像属性的情况下提高图像级别美学标签的利用率，我们提出一种基于注意力机制的方法:在训练阶段，方法将以一种端到端的、数据驱动的形式，自适应地给每个图像块赋予不同训练的权重，以此提高美学标签的利用率，进而得到更好的评估模型。从 AVA 数据集的评估正确率上看，使用我们方法所得到的卷积网络模型，在无需额外图像属性的情况下，能够取得比现有的评估方法都要高的美学评估正确率。 2. 提出一种基于自监督学习的图像美学评估算法。针对图像美学标签的获取成本高、含有主观因素等问题，一个合理的、有价值的思路是:利用无标签的图像数据来学习具有美学感知能力的表征。目前还没有这方面的研究工作。基于图像编辑操作与图像美学属性的关联，我们提出一个新型的、无需人工标签的、面向图像美学质量评估的自监督学习方法。此外，我们还首次在图像美学评估任务上对现有的经典的自监督学习方法进行性能评估。在 AVA、AADB 和 CUHKPQ 三个数据集上，我们方法能够取得比其他的自监督学习方法更好的效果，甚至优于使用 ImageNet 或者 Places 数据集的标签的表征方法。实验结果验证了在图像美学评估中利用无标签图像数据进行表征学习的有效性。 3. 建立一个大规模的食物图像美学评估数据集，并针对视觉美感质量评估中的过置信度问题提出了一个有效的正则化策略。食物图像是一类常见的图像类别，对人具有独特的吸引力，因此食物图像美学质量评估具有相当实际意义和应用价值。目前尚缺少这方面的研究。为了填补这个空白，我们建立一个较大规模的食物图像美学评估数据集(GPD):包含 24000 张图像(涵盖了大量常见的食物类别)和其对应的二分类美学标签。此外，针对图像美学评估中常见的过置信度问题，我们提出了一个简单而有效的正则化策略。通过大量的实验测试(包括在未知数据集上的泛化性测试，与 AVA 数据集的对比测试等)，我们验证了 GPD 数据集的美学标签和所提正则化方法的有效性。该研究将有助于后续研究者深入研究和开发与食物图像美学评估相关的应用。 4. 提出一个基于排序学习的人脸关键点定位质量无基准评估算法。人脸关键点定位是计算机视觉中一个经典的基本任务。考虑到一些基于人脸关键点定位结果的应用场景(如，人脸化妆、人脸卡通画)，我们需要设计一个无需人工标注基准即可完成定位质量评估的模型。目前还有没有相关任务的论文可供参考。我们的方法基于一个人脸关键点的先验知识:即，使用人脸关键点归一化后的人脸图像具有较强的稳定性。基于这个先验知识，我们通过构造一系列定位质量可控的关键点结果，利用排序学习来从这些序列数据中抽取有效信息来训练一个关键点定位质量模型。在四个公开人脸关键点评估数据集上，以 ESR 定位算法为例，我们的评估方法能够良好地完成关键点无基准评价。基于我们的评估算法模型，我们能够以很小的代价取得较大的定位质量提升。
英文摘要	Image aesthetic visual assessment is a very challenging cognitive task in the research field of computer vision. Its ultimate goal to enable an artificial intelligent agent to perceive, analyze and make decisions based on the visual aesthetic of input images. Aesthetic visual assessment has great values in several applications, such as image retrieval, image generation, image processing, and machine emotional intelligence. During the past few decades, the researchers in this topic have achieved huge developments. Nevertheless, due to the inherent subjective characteristics, image visual aesthetic still remains largely an open and challenging research topic. In this thesis, we attempt to solve some fundamental issues in the task: the low efficiency in exploiting image-level aesthetic annotations in visual aesthetic modeling, the lack of proper regularization methods in the task of image aesthetic assessment, and the inability to exploit the massive unlabeled images to learn aesthetic-aware visual features. We propose new and effective solutions to mitigate the aforementioned issues and obtain practical knowledge and insights. For practical applications, we also concern with two common specific image domains: face images and food photos. Specifically, the technical contributions of this thesis are listed as follows: 1. We propose an effective attention-based multi-patch aggregation method. Because the image-level aesthetic annotation cannot provide full information to reason the visual aesthetic of input images, the mainstream methods resort to using other image attributes in assessing visual aesthetic. However, these auxiliary attributes rely on the design of expert or expensive manual annotations, which are not flexible and effective enough. To boost the efficiency in exploiting image-level aesthetic annotations, we propose an attention-based multi-patch aggregation method: during the optimization process. We assign each image patches with adaptive training weights in a data-driven and end-to-end manner, and allocate more computation sources on instances with low target confidences. Numerical results on AVA benchmark indicate that our approach can achieve the best performance, even better than the approaches that use auxiliary information. 2. We propose an effective method to exploit unlabeled images for aesthetic-aware visual features. Due to the high cost in aesthetic annotations, a desirable idea is to exploit massive unlabeled data to learn aesthetic-aware features. To our best knowledge, there is no research work on this valuable idea. Building on the relation between negative visual aesthetics and several image editing manipulations, we design an effective self-supervised learning scheme to learn aesthetic-aware features. For comparison purpose, we experiment with other typical self-supervised methods. Quantitative results indicate that our method outperforms other counterparts on three image aesthetic assessment benchmarks, even better than using the $1000$-way labels in ImageNet directly. Thus, we verify the promising values in using unlabeled data in image aesthetic assessment. 3. We establish the first large-scale dataset for food image aesthetic assessment, and propose an effective regularization strategy to improve generalization ability. At the present stage, just a few research work participate in the task of food image aesthetic assessment, which has great potential practical values. To support the research efforts in the topic, we establish the first large-scale dataset for food image aesthetic assessment, i.e., gourmet photography dataset (GPD), which contains 24,000 food photos and corresponding aesthetic annotations. We also design a simple yet effective regularization method to combat against over-fitting issue for better generalization ability. Extensive experiment prove the effectiveness of the GPD and our regularization method. 4. We manage to evaluate the quality of face alignment results without their corresponding ground truth in a learning-to-rank approach. The study of face alignment has been an area of intense research in computer vision. The outputs of various face alignment methods are often image-dependent or somewhat random. It is desirable to develop a method to evaluate the quality of face alignment without ground truth. Few research work has concerned this tough problem. We address this problem by designing a feasible feature extraction scheme to measure the quality of face alignment results. The feature is then used in various machine learning algorithms to rank different face alignment results. Experimental results show that our method is promising for ranking face alignment results and is able to pick good face alignment results. With our proposed evaluation model, we can enhance the overall performance of a face alignment method with a random strategy with moderate cost.
关键词	图像美学质量评估注意力机制自监督学习正则化策略排序学习深度学习
语种	中文
七大方向——子方向分类	图像视频处理与分析
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/23893
专题	多模态人工智能系统全国重点实验室_模式分析与学习多模态人工智能系统全国重点实验室_多媒体计算
推荐引用方式 GB/T 7714	盛柯恺. 图像美学质量评估的方法与应用[D]. 中国科学院自动化研究所. 中国科学院大学,2019.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
UCAS_thesis_20190618（18854KB）	学位论文		开放获取	CC BY-NC-SA