行人再识别关键问题研究

	行人再识别关键问题研究
	黄厚景
	2020-12
页数	122
学位类型	博士
中文摘要	行人再识别指的是给定一张行人的图片，从各个摄像头采集到的行人图片库中，找出与该行人具有相同身份的图片。目前，城市监控摄像头数量正在快速增长，产生了海量的监控视频，迫切需要计算机视觉技术进行视频的自动分析和处理。行人再识别服务于行人检索、跟踪、身份验证等多项任务，是视频监控中一项重要的技术。然而，由于监控场景下存在的一些困难，目前行人再识别算法的性能还远远达不到实际使用的水平。（1）表观相似的行人大量存在，导致了较小的类间差；而摄像头视角、行人姿态的变化导致了较大的类内差。这两方面的难点给行人再识别算法的特征判别性提出了很高的要求。（2）行人身体受到遮挡的情况下，图片之间往往出现身体的不对齐，而且遮挡物带来了严重的噪声，导致行人再识别匹配准确率较低。（3）由于季节、天气、地域等因素的影响，行人再识别不同场景下的数据分布存在显著的差异。当训练和测试阶段的场景差异较大时，模型性能往往出现大幅下降。本文的工作围绕上述三个困难点展开。前期的两个工作在常规行人再识别设定下展开，致力于提升特征的判别性，分别从解决模型过拟合、利用部件知识进行辅助两个方面着手；后续的工作延伸到了遮挡设定、跨场景设定下的行人再识别。本文的四个工作概括如下：（1）基于对抗遮挡的行人再识别数据扩充方法。由于高昂的数据采集成本，公开的行人再识别数据集往往规模有限，深度学习方法容易出现过拟合。本文分析现有行人再识别模型，发现模型只利用身体的局部区域进行识别。虽然在训练集上达到了很高的准确率，但是在测试集上效果并不理想。因此，本文提出了对抗遮挡的数据扩充方法，把模型着重关注的区域进行遮挡产生新的样本，加入到原始训练集中参与训练。该方法可以促使模型捕捉到身体更多区域的特征，得到更加全面的行人特征表达，最终提升特征的判别性。实验表明该方法在行人再识别数据集上取得了优于其它方法的识别结果。（2）提升行人再识别特征判别性的部件知识学习方法。行人再识别特征学习过程中，身体结构知识有助于减少背景干扰、克服身体姿态变化带来的影响。本文提出一个显式学习身体结构知识的框架。具体地，模型主干网络在训练阶段同时参与行人再识别和部件分割两个任务。本文在主干网络后面扩展一个轻量的分割分支，期望主干网络学习到特征图上每一个像素属于哪一个身体部件。分析表明，该方法降低了部件之间特征的冗余度，同时促使模型在身体更多部位进行特征学习，最终得到判别性更强的特征。大量实验表明该方法显著提升了现有行人再识别模型的性能。（3）基于部件分割和多任务学习的遮挡行人再识别框架。监控场景下，行人身体受到遮挡的情况比较常见。一方面，遮挡物导致提取到的特征包含严重的噪声。另一方面，直接将遮挡图片和完整图片进行比对时，会导致身体部件的不对齐。本文提出使用部件分割来确定身体部件的位置，从而进行特征池化和部件可见度的判断。在两张图片比对的过程中，只利用两者共同可见的部件进行判别。部件分割和行人再识别采用多任务学习的方式，保证了算法运行的高效性。本文提出的方法不仅降低了遮挡物所带来噪声的影响，而且实现了部件的对齐，在遮挡行人再识别数据集上的性能大幅超过其它处理遮挡的方法。（4）基于目标域代理任务的行人再识别领域自适应方法。行人再识别模型在新的场景中测试的时候，由于图片质量、行人属性分布等方面的差异，往往出现性能的急剧下降。本文基于无监督领域自适应的设定，也即在目标域无身份标注图片的辅助下，提升模型在目标域的测试性能。本文通过共享主干网络的方式，在源域行人再识别训练过程中，同时进行目标域图片属性识别和部件分割的训练。这两个与行人再识别相关的身体感知任务，有助于模型提取判别性的表观特征，以一种间接的方式提升了行人再识别模型对目标域图片的适应度，本文也因此称其为代理任务。实验证实了代理任务显著提升了跨域行人再识别的性能，并且可以与现有的跨域方法相结合，最终取得优异的性能。
英文摘要	Given a pedestrian image as query, person re-identification (ReID) aims to retrieve the same person from images captured by all cameras. With fast increasing of surveillance cameras in cities, massive videos are produced and computer vision is required to automatically process the data. ReID is an important technique that supports pedestrian retrieval, tracking and verification, etc. Due to difficulties in surveillance scenario, performance of current ReID models is still far from the level of practical use. 1) Pedestrians with similar appearance abound, which leads to reduced inter-class distance; The same person shows differently with the variation of pose and camera viewpoint, which increases intra-class distance. The integration of both difficulties poses a requirement for strongly discriminative feature. 2) In the case of occlusion, the body locations of different images are not aligned. Besides, obstacles also bring severe noise to extracted feature. As a result, it is challenging to obtain high matching accuracy under occlusion circumstance. 3) Due to variation in season, weather and region, etc., it shows distinct data distributions under different scenes. ReID models tend to drop significantly in performance when tested in new scenes. This dissertation focuses on the above challenges. The first two contributions are under the regular setting of ReID, aiming at discriminative feature learning, by solving model overfitting and adopting part knowledge respectively; The last two contributions extend to occlusion and cross-scene settings respectively. The contributions are summarized as follows. (1) A data augmentation method based on adversarial occlusion for ReID. The scale of ReID training sets is relatively small due to expensive identity annotation procedure, which makes deep learning models vulnerable to overfitting. We discover that existing ReID model only focuses on a small portion of the body for recognition. Although the accuracy is high in training set, it drops drastically in test set. We propose to augment training samples by occluding regions a trained model emphasizes on, i.e. in an adversarial manner. The method urges the model to capture feature from more regions, making it comprehensive and thus more discriminative. We achieve performance superior to previous methods on ReID datasets. (2) A part knowledge learning framework to improve discrimination ability of ReID feature. For feature learning, body knowledge is beneficial for reducing the influence of background and pose variation. This dissertation proposes to explicitly learn body knowledge for ReID models. Concretely, we let the backbone serve both ReID and part segmentation during training. By extending a lightweight segmentation head after backbone, we guide it to understand which body part each pixel on feature map belongs to. The method decreases feature redundancy between parts and helps ReID model learn from more body regions, leading to improved discrimination ability. Extensive experiments demonstrate the prominent improvement for existing ReID models. (3) Human part segmentation based alignment with multi-task learning for occluded ReID. In surveillance scenario, it is frequent that human body is partially occluded. Occlusion not only brings serve noise to the extracted feature, but also causes misalignment if directly matching partial images with fully visible images. This dissertation proposes to utilize part segmentation to localize body regions and further pool feature and predict part visibility. In the process of comparing a pair of images, only parts visible in both images are considered. We implement ReID and segmentation with multi-task learning, which makes our algorithm efficient. The method not only mitigates influence of noise brought by occlusion, but also ensures part alignment, surpassing other methods by a large margin on occlusion datasets. (4) Target domain proxy task learning for cross-domain ReID. Exploiting ReID model in scenes unseen during training is always faced with huge performance drop because of domain discrepancy in image quality and pedestrian attributes, etc. This dissertation considers the setting of unsupervised domain adaptation, where unlabeled target domain images are available to improve the model. During training ReID in source domain, we train part segmentation and attribute recognition on target images, by sharing the backbone. The two body perception tasks are beneficial for learning discriminative feature, helping the model fit to target domain in an indirect manner. We consequently call them proxy tasks. Experiments show that proxy tasks significantly improve crossdomain ReID and is able to integrate with existing cross-domain methods.
关键词	行人再识别，判别性特征，行人遮挡，领域自适应
语种	中文
七大方向——子方向分类	图像视频处理与分析
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/42208
专题	复杂系统认知与决策实验室_智能系统与工程
推荐引用方式 GB/T 7714	黄厚景. 行人再识别关键问题研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2020.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
行人再识别关键问题研究.pdf（11976KB）	学位论文		开放获取	CC BY-NC-SA