With the fast increment of the requirement of public safety and the construction of the large-scale surveillance camera networks, how to retrieve the person of interest in large-scale videos has been a key problem in intelligent video analysis systems. As important research topics in computer vision, pedestrian attribute recognition and person Re-IDentification (ReID), which are complementary to each other and aim to solve attribute-based and image-based person retrieval, respectively, has been an indispensable technology in modern intelligent video analysis system. Although person retrieval has made great progress in recent years, it still has many challenges, such as unbalanced distribution in pedestrian attributes, person pose variance, background clusters, and occlusion, which make it a great problem in the learning of pedestrian representation and attribute classifiers. In this paper, we focus on the pedestrian attribute recognition and person ReID in person retrieval, and the contributions are as follows:
- Deep multi-attribute recognition model. Typical pedestrian attribute recognition methods usually adopt hand-crafted features and the relationship among different attributes are ignored. Instead of the hand-crafted features which may be hard to handle complex surveillance scenarios, for each attribute, we propose to use the convolutional neural networks to learn pedestrian features and attribute classifier jointly. Furthermore, to better utilize the relationship among attributes, we propose the deep multi-attribute recognition model, where the features are shared by all the attributes, and one attribute can assist the learning of another. Finally, an improved cross entropy loss is proposed to handle the unbalance attribute distribution. Experimental results show that the proposed end-to-end models can obtain better representations than hand-crafted features, and by exploring the relationship among attributes, the proposed deep multi-attribute recognition model has achieved state-of-the-art results.
- Pose guided deep multi-attribute recognition model. Existing methods typically treat pedestrian attribute recognition as an end-to-end multi-label classification problem, where the relationships between pedestrian attributes and person structure are usually ignored. In this paper, we propose a pose guided deep model to recognize pedestrian attributes, where the relationships between pedestrian attributes and pedestrian structure, such as the sunglasses only exist at the head, can be better utilized. The proposed model first discovers pose related body regions, then learns the region-based features, and finally combines all the regions' features for attribute recognition. In the test stage, we fuse the scores from full body and body part regions as the final results. Experimental results show that the proposed model can recognize local attributes much better and the fused results have obtained near state-of-the-art results on several popular datasets.
- Deep context-aware features over body and latent parts for person ReID. Although person ReID has made great progress in recent years, how to extract a powerful feature representation to handle the pose variations and background clusters is still an important problem. In this paper, we propose a deep multi-scale context-aware network to better learn fine-grained feature representations. To handle the person pose variance and cluster backgrounds, we propose a latent part localization model, which could learn body parts adaptively without ground truth part supervision, and the localized body parts are further used to obtain body parts based pedestrian representation. Finally, the feature representations from full body and body parts are fused for person ReID. Experimental results show that the latent body parts based representation could partially handle the person pose variations and cluster backgrounds, and the fused representation can obtain much better results in popular person ReID datasets.
- A richly annotated pedestrian dataset. Pedestrian attribute recognition and person ReID, which are complementary to each other, are two core components in person retrieval. Existing public datasets typically focus one of these two problems. To take further research on person retrieval, we propose a new richly annotated dataset which could support pedestrian attribute recognition, attribute-based person retrieval, and person ReID simultaneously, as well as the corresponding baseline results on these three tasks. Furthermore, the instance-based metric for attribute recognition is introduced to better measure the dependency of the prediction of multiple attributes. Finally, some interesting problems, e.g., the joint feature learning of attribute recognition and ReID, and the problem of cross-day person ReID, are explored to show the challenges and future directions in person retrieval.