基于深度学习的真实场景单张图像高光去除研究

CASIA OpenIR > 毕业生 > 博士学位论文

	基于深度学习的真实场景单张图像高光去除研究
	吴仲琦
	2022-05-22
页数	126
学位类型	博士
中文摘要	随着多种多样的数码硬件设备与计算摄像技术的更迭，图像成为了视觉信息最重要的载体之一。真实场景下获取的图像通常带有镜面高光效果，在一定程度上使得物体本征颜色和纹理信息被减弱甚至丢失，对后续的图像理解和应用（如图像分割、目标识别、三维重建等）造成较大的影响。因此，如何检测和去除高光并恢复完整的纹理细节，成为计算机视觉、图像处理和多媒体计算等领域的研究热点，选题具有重要理论意义和实际应用价值。高光检测与去除的任务已有几十年的研究历史，但现有的以知识驱动为主的传统方法和数据驱动为主的深度学习方法，仍然难以有效地去除真实场景中的镜面高光效果。针对这些问题，本文构建了面向真实场景的高光去除数据集；提出了基于卷积神经网络的模型框架，以解决传统算法去除高光时生成大量黑色阴影问题；利用局部自注意力机制迭代地进行高光检测与去除，以解决高光区域纹理细节恢复不准确的问题；最后，对高光检测与高光去除的联合问题进行研究，重点解决复杂光照条件下反射、折射与透射三种效果同时存在的高光检测与去除问题。本文的主要工作和创新点概括如下： 1.构建了基于真实场景的用于高光去除的数据集针对没有公开的真实场景的高光去除数据集和训练数据不足的问题，本文基于菲涅尔反射模型、斯涅耳折射定律和马吕斯定律等光学理论，设计并搭建了数据集拍摄装置，并基于此建立了一个高质量的高光去除基准数据集，首次获取到真实场景下镜面高光与漫反射区域配对的图像集。数据集中包含2,310个不同的场景下的13,380张图像，可用于深度神经网络模型的训练、高光去除效果的定量评估。 2. 基于卷积神经网络的单张图像高光去除方法本文提出了一种新的数据驱动的基于卷积神经网络的自动去除单张图像高光的方法。该方法通过VGG19网络来提取图像特征，引入高光掩膜图像和HSV色彩空间缩小色差，使用生成对抗网络对去除高光后的图像质量进行判定，实现了由数据驱动的自动去除单张图像的高光；为了提高生成的去除高光图像与真实的漫反射图像的相似性，利用上下文损失函数、对抗损失函数和一致性损失函数来联合优化所训练模型的参数。基于真实场景测试结果表明，本文方法可以解决单张图像高光去除时的黑色伪影问题。 3.基于局部自注意力机制的单张图像高光去除方法本文提出一种基于生成对抗式双分支网络的高光去除方法。该方法采用双分支网络，其中一条支路用于高光检测，另一条支路用于高光去除，两条网络迭代地执行高光检测与去除任务；通过定位高光区域和引入局部自注意力机制，实现直接对漫反射区域和镜面高光区域间映射关系的建模，使得网络更聚焦于高光区域的分布与纹理细节信息，从而增强了高光区域恢复效果，提高了生成的图像与真实的漫反射图像的相似度。大量对比实验结果表明，本文方法可以有效解决单张图像去除中的高光纹理细节恢复不准确的问题，对稀疏分布和高光面积小区域的去除效果更好。通过对真实的室内外场景数据的测试实验结果表明，所提网络模型具有较强的泛化性能，同时也说明本文所建立的数据集与真实环境的实际光照状况更为接近。 4.提出基于 Unet-Transformer 的联合单张图像高光检测与去除方法。为了将算法扩展到更复杂光照条件下的场景，本文提出了一种可同时对单张图像进行高光检测与高光去除的方法。基于对真实世界中具有高光的场景的观察，本文总结出高光图像具有高光尺度较小且分布稀疏，以及高光效果的颜色、高光区域的折射光的颜色均与光源的颜色相似两个特征。该方法将输入的高光图像，通过编码器-解码器结构的高光检测网络进行处理，生成高光掩膜图像，再将高光图像和高光掩膜图像同时输入到由Transformer编码器构成的高光去除网络中。Transformer编码器可以很好地捕获全局特性，并在连续的自注意层之间建立二者之间的关系，从而极大地提高了模型的表达能力。在基准数据集和室内外场景测试集的实验结果表明，本文方法可有效检测并去除金属、透明材质等物品在复杂光照情况下的高光，提高了高光去除的准确性和泛化性。
英文摘要	With recent advances in developing a variety of digital hardware devices and computational photography techniques, the image has become one of the most important carriers of visual information. Images captured in real-world scenes usually have specular highlights, which to a certain extent weaken or even lose some intrinsic color and texture information of objects. This will have a negative impact on subsequent image understanding and applications, such as image segmentation, object recognition, 3D reconstruction, etc. Therefore, detecting and removing specular highlight effects and restoring complete texture details have drawn a lot of attention in the fields of computer vision, image processing and multimedia. Importantly, the achievements of this study are also of great significance in practical applications. The task of specular highlight detection and removal has been studied for decades, but existing knowledge-driven traditional methods and data-driven deep learning methods are still difficult to effectively remove the specular highlights in real-world scenes. In order to effectively remove the specular highlights, this thesis firstly constructs a real-world dataset for specular highlight removal. Based on this dataset, a basic framework utilizing a convolutional neural network is proposed to solve the problem of traditional algorithms that generate a large number of black shadows. Further, the local self-attention mechanism is used to iteratively perform specular highlight detection and removal to facilitate the resolution of the inaccurate restoration of texture details in highlight regions. At last, this study explores joint specular highlight detection and removal with the simultaneous existence of three light effects including reflection, refraction, and transmission. The main contributions and innovations of this thesis are summarized as follows: 1. Construction of a real-world specular removal dataset. There is no public specular highlight removal dataset of real-world for training highlight removal neural networks. Based on the optical theoretical foundations such as the Fresnel reflection model, Snell's law of refraction and Marius' law, we established a studio with controllable lightings for photography. By utilizing this studio, a high-quality Specular-Diffuse benchmark dataset is first obtained, in which each specular highlight image is paired with the ground-truth specular-free diffuse image. The large-scale dataset contains 13,380 images in 2,310 different scenes, which can be used for the training of deep neural models and for quantitative evaluation of specular highlight removal methods. 2. A convolutional neural network for single-image highlight removal. This thesis proposes a new data-driven method to automatically remove specular highlights in single image. The features of images are extracted through the VGG-19 network, and the highlight masks and the HSV color space are introduced to reduce the chromatic aberration. Then, the generated adversarial network is used to judge the quality of the generated image. In order to improve the similarity between the generated non-specular image and the real diffuse image, the trained model utilizes a contextual loss function, an adversarial loss function, and a consistency loss function to jointly optimize the model parameters. The experimental results in real scenes show that the proposed method can eliminate black artifacts in specular highlight removal of a single image. 3. Single-image highlight removal based on a local self-attention mechanism. We propose an end-to-end dual-branch network for highlight removal based on Generative Adversarial Network (GAN), in which one branch is used for specular highlight detection and the other branch is used for specular highlight removal. In detail, we iteratively perform specular highlight detection and removal tasks. By locating the highlight areas and introducing the attention mechanism, we directly model the mapping relationship between diffusion and specular highlight areas, making the network focused on the specular highlight areas distribution and texture details. Thus it enhances the effect of specular highlight removal, while reducing the differences between the generated images and ground-truth diffuse images. A large number of comparative experimental results show that this method effectively solves the problem of inaccurate restoration of texture details after highlight removal in a single image, and has better removal effects for areas with sparse distribution and small highlight areas. In addition, through the exhaustive experiments on real indoor and outdoor scenes, we show that the proposed network has strong generalization performance, and it also indicates that the dataset established in Work 1 is close to the actual lighting conditions of the real environment. 4. Joint specular highlight detection and removal in single images via Unet-Transformer. In order to extend our algorithm to the scenes under more complex lighting conditions, we then propose a new deep neural model which jointly detects and removes specular highlights from a single image. We have observed that scenes with specular highlights in the real-world have two common characteristics: firstly, specular highlights are usually small-sized and sparsely distributed; secondly, both the colors of the highlights and the refracted light in the highlight areas are similar to the color of the light source. In this thesis, we utilize an encoder-decoder network to detect specular highlights and generate a highlight mask image. Then the two images are input into a Transformer network for highlight removal. The Swin transformer we applied works well to capture global features and establish relationships between continuous self-attention layers. This enables interaction and connection between windows of the previous layer, which greatly improves the expressive ability of the model. Through the experiments on a public benchmark dataset and real indoor and outdoor scenes, we show that our method can effectively detect and remove the specular highlight of objects such as metals and transparent materials under complex lighting conditions. Our approach further improves the accuracy and generalization performance of highlight removal.
关键词	高光去除，高光检测，偏振光，高光数据集，深度学习
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/48680
专题	毕业生_博士学位论文多模态人工智能系统全国重点实验室_三维可视计算
推荐引用方式 GB/T 7714	吴仲琦. 基于深度学习的真实场景单张图像高光去除研究[D]. 中国科学院自动化研究所. 中国科学院大学,2022.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
博士学位论文_吴仲琦.pdf（55728KB）	学位论文		限制开放	CC BY-NC-SA