英文摘要 | With the steady advancement of the intelligent city process, the scale of surveillance camera networks deployed in various places is getting larger and larger. In this context, it is impractical to retrieve and process surveillance video data manually. Recent research focuses on employing computer vision technology to automate the surveillance pipeline.
As a critical component of the intelligent surveillance system, person re-identification has drawn extensive attention from academia and industry.
Given a picture of a person to be queried, person re-identification (re-ID) aims to retrieve other images of this person from multiple non-overlapping camera perspectives. The key to person re-identification is to learn a feature space that possesses intra-class compactness and inter-class separation. However, drastic changes in camera viewpoint, illumination, background, and human pose may cause dramatic visual appearances between images of the same person, which greatly challenges the discrimination ability of the re-ID model. In addition, the heavy reliance on annotated data also limits the deployment of person re-identification models in real scenarios. To overcome these issues, this thesis explores improving the discrimination and generalization of person re-identification from three aspects.
1. proposes a feature aggregation method for fully supervised person re-identification, which effectively improves the resistance of the model to intra-class variation. The intra-class variation is the main challenge to the discrimination ability of the model. To alleviate this issue, existing methods either classify the samples individually or constrain the pairwise similarity between a small number of samples. Such methods ignore the dense connection between input samples, so they tend to fall into suboptimal local optima. To this end, this work proposes to formulate a training batch as a densely connected graph, and aggregates features according to the random walk process on the graph. This practice alleviates the confusion around the decision plane and enhances the resistance to intra-class variations. Furthermore, this work extends this idea to the post-processing stage of person re-identification and tasks in other dimensions. Experimental results show that feature aggregation brings performance improvements in all stages and dimensions, which verifies the effectiveness and versatility of this idea.
2. proposes a dual discrepancy adaptation method for cross-domain person re-identification to alleviate the adverse effect of inter-domain shift and inter-camera discrepancy. This work reveals that the low transferability of the person re-identification model is attributed to inter-domain shift and inter-camera discrepancy. To handle inter-camera discrepancy within the target domain, this work proposes a camera-aware neighborhood consistency constraint, which avoids the bias to intra-camera candidates in vanilla neighborhood consistency constraints. To cope with inter-domain shifts, this work proposes a cross-domain mixup scheme to smooth the transition from the source domain to the target domain. By combining the two components, the model proposed in this work achieves leading performance metrics on all benchmarks for cross-domain person re-identification.
3. proposes a 3D synthesis method for person re-identification. Rather than training the model on prohibitive annotated data, this work proposes to use synthetic data as the source domain for training. This work presents a pose-controllable 3D generative model for virtual data synthesis. Specifically, It guides the tri-plane representation via rasterization and uses Deep Marching Tetrahedra to directly output a 3D mesh. By manipulating the pose and camera perspective, this work can synthesize diverse training samples for person re-identification. After that, this work employs the cross-domain person re-identification algorithm to transfer from the synthetic domain to the realistic domain. In this way, this work can achieve a transferable person re-identification model without any identity annotations.
In summary, this thesis first studies the discrimination ability of person re-identification models in a fully supervised setting.
The core of this research is to suppress the adverse effect of intra-class differences. After that, this thesis explores the generalization problem of person re-identification, reveals the existence of dual discrepancies under the cross-domain setting, and designs corresponding modules to solve them. Finally, this thesis attempts to learn a transferable person re-identification model without using prohibitive annotations. Instead, it applies 3D synthesis techniques to generate virtual data as the source domain data for training. The proposed components in this thesis lead to improvements consistently. The proposed models achieve leading performance on corresponding benchmarks. |
修改评论