基于姿态信息的行人重识别

CASIA OpenIR > 智能制造技术与系统研究中心 > 多维数据分析（彭思龙）-技术团队

	基于姿态信息的行人重识别
	贾力榜
	2021-05-28
页数	58
学位类型	硕士
中文摘要	随着人工智能技术的迅速发展，智能化视频监控技术成为城市安全防范体系的重要一环。近年来，人脸识别技术愈发成熟，被广泛用于各类监控系统中进行行人身份核对。然后现实场景的视频监控系统无法保证在各类复杂环境下拍摄到清晰的人脸图像。因此，利用全身信息进行行人身份核对与识别变得尤为重要。行人重识别，是一种利用计算机视觉方法搜寻不同摄像头下特定行人的技术，在智慧城市等视频监控场景中具有重要的意义。作为智能视频分析的研究热点，许多科研人员在行人重识别领域辛苦耕耘，取得了丰硕的研究成果。但在实际应用场景中，受摄像机参数，行人遮挡，光照以及姿态差异等因素的影响，同一目标在不同摄像头下视觉差异大，识别准确率较低。因此，本文重点研究跨视角场景下如何获取身份相关，姿态无关的高鲁棒性行人特征。本文采用姿态引导行人对齐的方法进行行人重识别，引入压缩-激励模块和聚合转换神经网络，提升行人重识别网络特征抽取能力，有效的降低了姿态差异对识别精度的影响。本文的具体内容与创新点如下：（1）提出了一种基于压缩-激励注意力模块的行人重识别方法。为了提取身份相关，姿态无关的行人特征，本文使用基于姿态引导的重识别基准网络，实现行人身份的自动高精度识别。考虑到基准模型的骨干网络ResNet50针对特定信息抽取能力较弱，我们引入压缩-激励注意力模块，重点学习与行人身份相关的信息，抑制无关信息。压缩-激励注意力模块可以建立特征通道间的联系，从而自适应的从全局信息出发，学习和身份识别相关性更高的特征表示。我们在三个公开数据集Market-1501、DukeMTMC-reID和CUHK03上进行了对比实验，实验结果表明top-1准确率至少提升0.7个百分点，mAP 准确度至少提升1.0个百分点，证明了该模型的有效性。（2）提出一种基于聚合转换和卷积块注意力模块的行人重识别方法。本文使用姿态归一化网络作为基准网络，通过生成8个标准姿态图来扩充原始数据集，进一步提升行人重识别模型的鲁棒性与识别精度。我们首先将原始模型的ResNet50结构全部替换为聚合转换深度神经网络，然后引入轻量化的卷积块注意力模块。聚合转换深度神经网络结合了VGG网络的堆叠思想和Inception网络的分离-转换-聚合策略，在不增加网络深度和宽度的同时提升了行人重识别准确率。为了验证模型的有效性，在Market-1501、DukeMTMC-reID和CUHK03数据集上进行对比实验，结果表明改进后的ResNeXt50注意力网络，有助于提升模型的鲁棒性，表现出更好的识别精度。
英文摘要	Person re-identification is a sub problem of image retrieval, which aims to use computer vision to determine whether there is a specific pedestrian in the image or video sequence. As a research hotspot of intelligent video analysis, many researchers have worked hard in this field and achieved fruitful results. But in the actual application scene, affected by the camera parameters, pedestrian occlusion and posture differences, the same target has great visual differences in different lenses. Therefore, this paper focuses on how to obtain identity related and pose independent pedestrian features with high robustness in cross view scenes. In this paper, a pedestrian recognition method based on pose guided pedestrian alignment is proposed. The compression excitation module and the aggregation transformation neural network are introduced to improve the feature extraction ability of pedestrian recognition network, and effectively reduce the impact of pose difference on recognition accuracy. The details are as follows: (1) Person re-identification based on squeeze-and-excitation module. In this chapter, we use the FD-GAN as the basic network. Considering that the bottleneck layer unit of resnet50, whose backbone network of the model is only a simple addition operation at last layer, and the ability of extracting specific information is weak, we introduce a squeeze-and-excitation (SE) attention module. SE module can establish the relationship between feature channels, so as to adaptively start from the global information and learn the feature representation with higher correlation with identity recognition. We have carried out comparative experiments on Market-1501、DukeMTMC-reID and CUHK03 datasets. The experimental results show that the accuracy of top-1 is improved by at least 0.7 percentage points, and the accuracy of map is improved by at least 1.0 percentage points, which proves the effectiveness of the model. (2) Person re-identification based on ResNeXt module. In this chapter, the PG-GAN network is used as the benchmark network. In order to further improve the robustness and recognition accuracy of the pedestrian recognition model, we first replace the resnet50 structure of the original model with the resnext50, and introduce the lightweight convolutional block attention module (CBAM). Compared with the traditional neural network, ResNeXt network combines the stacking idea of VGG network and the separation-transformation-aggregation strategy of Inception network, which improves the accuracy of person re-identification without increasing the depth and width of the network. Through the comparative experiments on Market-1501、DukeMTMC-reID and CUHK03 datasets, the results show that the improved resnext50 attention network can help to improve the robustness of the model and show better recognition accuracy.
关键词	卷积神经网络姿态估计注意力机制行人重识别数据增广
语种	中文
七大方向——子方向分类	机器学习
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/45054
专题	智能制造技术与系统研究中心_多维数据分析（彭思龙）-技术团队
推荐引用方式 GB/T 7714	贾力榜. 基于姿态信息的行人重识别[D]. 中国科学院自动化研究所. 中国科学院大学,2021.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
Thesis_527.pdf（3447KB）	学位论文		开放获取	CC BY-NC-SA