行人再识别的特征表达研究

	行人再识别的特征表达研究
	杨文杰
	2021-05-27
页数	138
学位类型	博士
中文摘要	行人再识别旨在关联不同监控摄像头下具有相同身份的行人。伴随着监控网络的不断扩大而产生的海量视频数据迫切需要智能视频监控来高效地处理与分析。作为智能视频监控的关键技术之一，行人再识别在目标跟踪、行人检索、身份验证中发挥着重要的作用。经过十几年的发展，行人再识别方法在速度、精度和泛化性上都取得了很大的提升。然而，大类内差小类间差依然是行人再识别特征表达学习面临的主要挑战之一。首先，由于人体姿态和摄像头视角的变化，同一行人在不同摄像头下呈现的表观差异很大，并且大规模数据中不同行人的表观可以很相似；其次，运动的行人对应复杂多变的背景，我们通过实验分析观察到，背景信息干扰了行人的特征表达，体现在以深度学习为代表的方法容易检索到与感兴趣目标位于相似背景中的假阳性行人；最后，不精确的检测框和背景的遮挡导致行人仅部分可见，这是实际应用中常见的导致类内差变大、类间差变小的挑战。针对以上问题，本文分别从行人局部特征学习、抑制背景、人体或部件可见度等方面出发，提出了一系列方法。本文所展开的研究工作可归纳如下：（1）基于类别激活图的多样化局部特征学习。一方面，行人的表观在不同的人体姿态和摄像头视角下差异很大，而局部特征对人体姿态、光照条件等的变化具有较好的鲁棒性；另一方面，全局特征的表达能力不足。为了区分具有较小差异的行人，本文提出多样化的局部特征学习。已有的局部特征学习研究可以分为基于注意力机制的隐式局部特征和基于部件定位模型的部件特征，区别于已有的局部特征学习研究，本方法不需要基于注意力机制引入的参数也不需要引入额外的部件定位模型，而利用类别激活图来定位人体局部，并提出重叠激活惩罚引导多分支的模型学习多样化的局部特征。该方法为学习行人局部特征提供了一个简洁高性能的模型。（2）前景觉知的多尺度特征学习。运动的行人对应多变的背景，本文的分析发现，背景对行人再识别的特征表达学习造成了干扰，体现在相似的背景会在特征空间拉近不同身份的行人而不同的背景增大了行人类内差异。此外，融合不同语义层级的特征有助于增强特征判别性，如低层语义的颜色、纹理，高层语义的性别、服装类型。为此，本方法一方面提出了显式地学习前景概率图用于抑制背 I行人再识别的特征表达研究景信息，也即非行人的图像区域。另一方面提出了自底向上的网络结构用于融合多尺度的不同语义特征，从而提升特征的判别性。定性和定量的实验均证明了模型在抑制背景干扰方面的有效性。 (3) 基于人体可见度的自适应标签平滑。实际应用中，因检测器非精确检测或行人部分身体走出摄像机视野，导致行人部件缺失。而部件缺失的行人和完整可见的行人往往具有相同的 one-hot 类别标签，也就引导模型对部件缺失行人做出过度自信的类别预测，这使得模型容易拟合于身体局部。我们利用端到端行人再识别数据集中的行人边界框标注，得到标注框和检测框之间的交并比作为行人人体可见度的度量，用该度量对类别标签进行平滑。大量实验表明该方法显著提升了行人再识别模型的性能。 (4) 部件可见度觉知的特征学习。未考虑行人部件可见度的行人再识别模型无法精确地提取可见部件进行特征学习和匹配，从而难以泛化到人体部件缺失的场景，如遮挡。已有方法引入额外人体语义模型直接预测可见度，或者生成人体语义部件伪标签作为行人再识别模型中部件定位器的监督。我们提出不依赖于额外人体语义模型的方法学习觉知人体部件的可见度。定性的实验表明该方法可以觉知人体头部、上半身和下半身等部件。定量的实验证明了该方法在全身可见和遮挡场景下的有效性。
英文摘要	Person re-identifcation (re-ID) aims to associate pedestrians with the same identity under multiple non-overlapping cameras in surveillance system. The expansion of surveillance networks results in massive videos, which urgently needs intelligent video technology for efcient processing and analysis. As one of the key technologies of intelligent video analysis, re-ID plays an important role in object tracking, pedestrian retrieval, and identity verifcation. In the last decade, the re-ID methods achieved signifcant improvement in terms of efciency, accuracy and generalization. However, the large intra-class variation and small inter-class variation is still one of the main challenges in re-ID presentation learning. Firstly, since the articulated deformations of the human pose and large variation of camera views, the appearance of person under different cameras changes a lot, and the appearance of diﬀerent persons in the large-scale dataset can be very similar. Secondly, the moving person results in background clutter. We observe from the experimental analysis that the background information biased the feature representation. The deep learning based method tends to match a false positive person has a similar background to the query person. Finally, the inaccurate detection and the occlusion cause the person body to be only partially visible, which is a common challenge in practical application that leads to the large intra-class and small inter-class variation. To tackle the above challenges, this paper proposes a series of methods from four aspects of local feature learning, background suppression, body or body part visibility. The works carried out in this paper can be summarized as follows: (1) Class Activation Maps based Diverse Local Feature Learning. On the one hand, the articulated deformations of the human pose and large variations of camera views result in large intra-class variation. The local cues are more robust to the variations of human pose and lighting, etc. On the other hand, global feature is not capable of distinguishing persons with subtle diﬀerences. To this end, this paper proposes to learn diverse local features. The existing local feature learning methods can be grouped into two groups, i.e., attention mechanism based latent local feature learning and extra III行人再识别的特征表达研究 part localization model based part feature learning. Diﬀerently, our method does not introduce the parameters like the attention mechanism and not require the extra part localization model. We adopt class activation map to locate the body parts, and the proposed overlapping activation penalty guides the multi-branch model to learn diverse local features. Our method provides a concise and high-performance model for learning the local features. (2) Foreground-Aware Multi-Scale Feature Learning. The moving person results in background clutter. We observe from the experimental analysis that the background information biases the feature representation and enlarges the intra-class variation. Since the similar background pulls the pedestrians of diﬀerent identities, while diﬀerent backgrounds pushes the pedestrians of the same identity. In addition, fusing features of multiple semantic levels can enhance the feature discrimination, such as color and texture of lower-level semantic, gender and clothing style of higher-level semantic. To this end, on the one hand, we propose to explicitly learn the foreground probability map to suppress the background information, i.e., the non-pedestrian image regions. On the other hand, a bottom-up network is proposed to fuse multi-scale features of diﬀerent semantic, so as to improve the discrimination of features. Both qualitative and quantitative experiments demonstrate the eﬀectiveness of the model in suppressing the background information. (3) The Body Visibility based Adaptive Label Smoothing. In the real-world application, the human body part missing is one of the main challenges, it results from inaccurate detection or the pedestrian walks out of the feld of camera view. Usually, the partially visible person and the holistic person are assigned the one-hot identity label for feature learning. The model is trained to predict overconfdent class probabilities for the partially visible person and thus is vulnerable to be overftting. We utilize the bounding box annotations in the end-to-end re-ID datasets to measure the body visibility of pedestrians under inaccurate scenario. The body visibility is given by the IoU between the ground-truth box and the detected box. The visibility is used to smooth the identity label. The experiment results show that our method signifcantly improves the performance of re-ID model. IVAbstract (4) The Part Visibility-aware Feature Learning. The re-ID model that is not aware of part visibility cannot accurately locate the body parts for part feature learning and alignment. It is thus cannot be well-generalized to the human part missing scenario, e.g., occlusion. Existing methods introduce extra part localization models to directly predict part visibility, or generate pseudo part labels as the supervision of the part locator in the re-ID model. We propose a method that does not rely on extra part localization models to perceive the visibility of body parts. Qualitative experiments show that this method can perceive human head, upper body and lower body parts. Quantitative experiments validate the eﬀectiveness of our method in both holistic and partially visible scenario.
关键词	行人再识别表达学习行人遮挡行人检测
语种	中文
七大方向——子方向分类	图像视频处理与分析
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/44904
专题	复杂系统认知与决策实验室_智能系统与工程
推荐引用方式 GB/T 7714	杨文杰. 行人再识别的特征表达研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2021.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
行人再识别的特征表达研究.pdf（33650KB）	学位论文		开放获取	CC BY-NC-SA