1. 恶劣天气和环境下获得的图像往往会出现对比度下降、色彩暗淡、细节缺失等问题。这不但会影响人的主观视觉感受，还会严重影响安防监控、自动驾驶等智能系统的性能。一方面，本文针对单幅图像去雾问题，基于暗通道先验以及雾霾图像的特性，通过注意力机制建立了异质图像知识表征，并与作为视觉模型的卷积神经网络进行了结合。另一方面，针对将低光噪声图像复原为明亮清晰的图像这一问题，本文将其视为一种特殊的带噪声的图像转换问题。基于 VGG 网络对噪声敏感的特性，构建了异质图像知识模型，即设计了一个感知损失函数，并利用生成对抗网络来作为视觉感知模型。设计的感知损失可以通过增强不同层次的结构一致性的约束来减轻噪声的影响。主要创新点为：（1）提出了一种用于单幅图像去雾的特征聚合注意网络，该网络结合了注意力机制和残差学习，能够自适应地聚合不同层次的特征；（2）提出了一个增强的生成对抗网络来端到端地解决带噪声的图像转换问题；（3）在去雾数据集上的实验表明，提出的方法实现了最佳的去雾效果。实验验证了提出的增强的生成对抗网络显著优于其他先进方法，直接用于图像去噪也取得了最先进的性能。
With the development of image acquisition equipment and multimedia technology, people have more and more diversified ways to obtain images, and different kinds of heterogeneous images are emerging. Thus, many practical application requirements based on specific heterogeneous images have arisen. Traditional computer vision models mainly focus on natural scene images, such as photos and street view images taken by mobile phones or cameras. There are often large inter-domain modal differences between images of different forms. Therefore, these traditional computer vision algorithms are usually difficult to address the specific heterogeneous images perception problem.
Facing these problems, this paper tries to utilize domain knowledge related to specific visual tasks, i.e., specific heterogeneous image knowledge, to better deal with the corresponding visual perception problems, and proposes a visual perception framework based on heterogeneous image knowledge. On the one hand, the framework constructs visual representation based on heterogeneous images through feature extraction and representation learning. On the other hand, it uncovers and establishes knowledge representation for specific heterogeneous image related visual tasks, and combines visual and knowledge representations to effectively solve the visual problems related to heterogeneous images. Among them, knowledge representations can be obtained by heterogeneous image knowledge models. The framework aims to obtain the heterogeneous image knowledge of specific tasks through human experience summary or knowledge mining, and establish a knowledge guidance mechanism to integrate it into traditional visual perception models to deepen the models' understanding of specific tasks, so as to solve heterogeneous image related visual problems more effectively. In this paper, the models designed based on the framework are explored and studied on the tasks related to heterogeneous images, such as image defogging, face sketch synthesis, face cartoon synthesis, and weakly supervised sketch pedestrian search. The results show that the framework can achieve good performance in the traditional low-level, middle-level and high-level visual tasks. This paper can be summarized in the following three aspects:
1. Images obtained in bad weather and environment often suffer from poor contrast, dull colors, and lack of detail. This not only affects the subjective visual perception of people, but also seriously affects the performance of intelligent systems such as security monitoring and autonomous driving. On the one hand, for single image dehazing, based on the dark channel prior and the characteristics of haze images, a heterogeneous image knowledge representation is established through the attention mechanism and combined with convolutional neural network as a visual model. On the other hand, the problem of transforming dark noisy images to bright and noise-free images is considered as a special case of image-to-image translation with noise in this paper. As the VGG networks are sensitive to noise, a heterogeneous image knowledge model is constructed, i.e., a perceptual loss function is designed and a generative adversarial network is used as a visual perception model. The perceptual loss can mitigate the effects of noise and boost the performance by enhancing the constraints for different level structural consistency. In summary, the main contributions are: (1) A feature aggregation attention network (FAAN) for single image dehazing is proposed, which incorporates attention mechanisms and residual learning and can adaptively aggregate different level features; (2) An enhanced generative adversarial network (EGAN) is proposed to solve the problem of image-to-image translation with noise end-to-end; (3) Experiments on the dehazing dataset show that the proposed method achieves the best results. The experiments verify that the proposed EGAN significantly outperforms other state-of-the-art methods and achieves best performance when directly applied to image denoising.
2. As one of the important research objects of biometric recognition, human face has the advantages of rich information and easy access. But in practice, face photos are not always available. In addition, with the development of the culture of animation, comics, games, novel (ACGN) as well as social media, sketches, caricatures and other representative ACGN works are increasingly appearing in people's daily life. Face images naturally contain rich identity information, but the existing heterogeneous face synthesis methods rarely consider the preservation of identity information in the synthesis process. Therefore, on the one hand, for face photo-sketch synthesis, the paper introduces additional identity labels and constructs a heterogeneous image knowledge model, that is, an identity recognition loss function, while using a cycle consistency based generative adversarial network as a visual model. On the other hand, for caricature synthesis, we construct a heterogeneous image knowledge model, namely identity preservation loss function, based on the implicit face identity characteristics, and utilize a generative adversarial network containing warping controllers to obtain visual representations. In summary, the main contributions are: (1) An identity-sensitive generative adversarial network for face photo-sketch synthesis is proposed; (2) An identity-preservation generative adversarial network is proposed for unsupervised photo-to-caricature translation; (3) Extensive experiments show that compared with other advanced methods, the proposed methods achieve the best performance, and the synthesized results are more realistic, more visually appealing and retain more identity details.
3. While existing person search methods have achieved good performance, they require the images used for training contain fine labels. However, it is expensive and difficult to manually annotate these labels in the large scale scenario. To overcome this problem, a weakly supervised person search method is proposed in this paper. In addition, considering that photos of the target person to be searched are not always available in many practical scenarios, this paper proposes and investigates the weakly supervised sketch based person search problem, which uses a sketch instead of a photo as the probe for retrieving. Based on the sparse pixel distribution of sketch, a heterogeneous image knowledge representation is built by using the attention mechanism, and the proposed weakly supervised person search method is used as a visual perception model. In summary, the main contributions include: (1) A weakly supervised learning method based on clustering and patches is proposed for weakly supervised person search; (2) Weakly supervised sketch based person search problem is proposed and studied, and a solution based on clustering and feature attention is designed; (3) A large number of experiments on two publicly available datasets validate the feasibility of the proposed weakly supervised setting for person search and the effectiveness of the proposed method.
|Keyword||异质图像 计算机视觉 深度学习 生成对抗网络|
|严岚. 基于异质图像知识的视觉感知方法研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2022.|
|Files in This Item:|
|Recommend this item|
|Export to Endnote|
|Similar articles in Google Scholar|
|Similar articles in Baidu academic|
|Similar articles in Bing Scholar|
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.