英文摘要 | Artificial intelligence has entered a new stage with the development of deep learning. Deep neural networks have made great progress in several fields, especially in some recognition tasks of computer vision. We can obtain abundant image data via the Internet. However, the methods based on deep neural networks rely on labeled data greatly, but labeling these data manually is really expensive and time-consuming. Nowadays, generative adversarial network (GAN) brings a new way to solve the above problem. GAN optimizes the generator and discriminator simultaneously, whose optimization objectives are opposite to each other. Then the generator can generate realistic virtual samples with the adversarial manner.
We can generate a large number of training samples by using adversarial learning. Besides, we can map the samples from the labeled source domain and unlabeled target domain to the domain-invariant feature representations by using a domain critic network and adversarial learning, which can help to transfer the discriminative information learning from the labeled source domain to the unlabeled target domain and improve the performance on the target domain. This dissertation focuses on the above two ideas, the specific research contents and contributions are summarized as follows:
To solve the lack and imbalance of training data in scene text recognition problem, this dissertation proposes a high-quality virtual sample generation framework based on image information disentangling. In this framework, we disentangle the information of an image into two parts: the content and style, where the content contains specific semantic information while the style contains other information. By learning a style encoder, we can generate high-quality virtual samples with different styles with specific content. In the training procedure, we design a double-cycle adversarial training strategy, which can reduce the over-fitting problem by exchanging the contents and styles of the input twice. The experimental results show that this method can generate virtual samples which can improve the performance of the vehicle license plate recognition. What's more, this method can also be used into other image generation problems, such as generating Chinese characters with different fonts or generating images of shoes and handbags by using edge images.
The existing pedestrian detection models have poor robustness on scale and occlusion variations and poor generalization ability on cross-domain testing. To solve that, this dissertation proposes a novel data augmentation scheme by person transferring. Specifically, we embed the pedestrian objects from other datasets into the target scene, which can generate a large number of virtual samples with scale and occlusion variations. Then we design an Attribute Preserving Generative Adversarial Network (APGAN), which can improve the fidelity of the generated images while preserving the attribute of the embedding pedestrians. At last, we use the generated images for data augmentation, which can improve the robustness and generalization ability of the pedestrian detector by introducing the diversity of scale and occlusion into the training set. The experimental results show that this scheme can improve the performance of several pedestrian detection methods and benchmarks.
Most existing person pose transfer methods require identity information to build person image pairs with the same identity but different poses. Meanwhile, these methods also cause the problem of background over-fitting. To tackle the above problems, this dissertation proposes an unsupervised cycle-consistent person pose transfer approach. It is trained with unpaired cross-identity person images and can preserve the background information well. Compared with previous methods, the proposed approach achieves better results in cross-identity person pose transfer task and similar results in self-identity one. Moreover, this method can serve as an effective data augmentation scheme for person recognition tasks, which is validated by the experiments on pedestrian re-identification and detection.
To solve the problem that the performance of object re-identification (re-ID) may have a severe performance drop in the cross-domain test situation. This dissertation proposes a novel unsupervised domain adaptation framework for object re-ID by fusing feature adversarial learning and self-similarity clustering. Previous work proposes that the domain-invariant feature learning methods are not suitable for re-ID problem because the label spaces of the source and target domain are entirely different. This dissertation doubts the opinion and introduces the feature adversarial learning into re-ID problem firstly. The proposed framework minimizes the discrepancy of feature representations between source and target domains measured by adversarial training and learns domain-invariant feature representations. Experimental results show the effectiveness of this module.Besides, this dissertation designs the self-similarity clustering module to mine the implicit similarity relationships among the unlabeled samples of the target domain. By unsupervised clustering, we can generate pseudo-identity labels for the target domain data, which are then combined with the labeled source data together to train the feature extractor network. This module is different from the existing methods on that we fix the number of cluster centers and present a relabeling algorithm to construct correspondence between two groups of pseudo-identity labels generated by two iterative clusterings. By fusing the above two modules, we set the new state-of-the-art performance for unsupervised domain adaptation on vehicle re-ID and person re-ID benchmarks. |
修改评论