CASIA OpenIR  > 模式识别国家重点实验室
面向样本多样性的行人重识别方法研究
李耀宇
Subtype博士
Thesis Advisor徐常胜
2021-05-29
Degree Grantor中国科学院大学
Place of Conferral中国科学院自动化研究所
Degree Name工学博士
Degree Discipline模式识别与智能系统
Keyword行人重识别 全局特征学习 局部信息挖掘 图卷积网络 无监督域适应
Abstract

随着城市化进程的推进,大量的摄像监控设备被广泛的部署行人密集的公共场合,构成了大规模的分布式监控网络,并提供了海量的视频监控数据。传统的人工处理方式已无法应对规模庞大的视频监控数据。随着人工智能技术的发展,人们希望利用计算机视觉、机器学习和模式识别等领域的技术构建智能视频监控系统,可以快速准确地从监控数据中找到有效信息。行人重识别,作为实现这一目标的关键技术,旨在匹配不同摄像机视角下的行人,从而可以从大规模视频数据中实现快速的行人检索。因其在行人轨迹跟踪和行为分析等方面具有重要作用,行人重识别受到了研究者的广泛关注。

 

目前的行人重识别研究虽然基于深度学习方法取得了突出的效果,但在实际应用中,仍面临着来自两个方面的重大挑战:(1) 域内数据变化多样。同一个数据集中的行人数据,容易受行人姿态变化、光照变化、杂乱背景、分辨率差异,以及局部遮挡等因素的影响,使行人在不同镜头下外观变化较大,甚至不同行人不同图像之间的外观相似度大于同一行人不同图像;(2)不同数据域之间存在数据分布差异。不同数据集的拍摄环境以及摄像配置不一致,导致采集的数据存在较大的差异。这些差异使得行人重识别模型在跨数据集时表现非常差。本文针对上述挑战对行人重识别任务展开了研究,首先针对域内数据变化多样问题,研究了有监督场景下的姿态不变性学习、自适应特征融合、以及结构化特征学习等方法。进一步地,针对跨域数据分布差异导致模型性能下降的问题,探索了目标域一致性增强以及三元组模型异质学习等无监督域适应方法。

 

论文的主要工作和创新点归纳如下:

 

1.基于姿态不变性学习的行人重识别方法。

行人姿态变化是造成行人外观变化的主要因素,并且姿态变化造成的身体各部位位置的变化,也会导致行人特征的不对齐问题。针对上述问题,本文提出了一个基于生成对抗网络的姿态不变性学习方法。该方法首先通过对抗网络将身份信息与行人姿态解耦,并生成保留身份信息且姿态可控的行人图像;其次,用生成数据辅助训练特征提取网络。在提取行人特征时,将当前图像和与其对应的若干任意姿态下的生成图像信息融合,融合后的特征对姿态变化具有很强的鲁棒性。

 

2.基于自适应特征融合的行人重识别方法。

已有的方法在生成行人特征时忽略了图像之间存在的联系。为了更好地利用近邻图像间的相似关系学习更鲁棒的全局特征,本文提出了一个基于图卷积神经网络的自适应特征融合方法。该方法通过对当前样本及其近邻样本构建一个图结构,然后在该图上进行图卷积操作,实现节点之间进行信息传递以及节点特征融合与更新。融合后的特征因为考虑了近邻样本的信息而对于复杂多样的行人数据具有更强的鲁棒性与判别力。

 

3.基于结构化特征学习的行人重识别方法。

行人重识别任务的关键是生成鲁棒而有判别力的行人特征。考虑到现有方法在挖掘局部信息的同时,仍易受背景信息干扰,且忽略了局部特征之间的相关关系,本文提出了一个结构化特征学习模型。该模型利用行人解析方法将行人的不同语义区域与背景分割开,从而尽可能地避免背景信息干扰。然后利用图神经网络建模不同语义区域特征的相关关系,从而得到结构化的行人特征。通过将行人全局特征、局部语义特征、以及结构化特征相结合,可以进一步增强行人特征,提升行人匹配的准确率。

 

4.基于目标域一致性增强的跨域行人重识别方法。

由于不同域之间的数据分布存在差异,行人重识别模型在未见过的目标域数据集上,性能会严重下降。已有的跨域行人重识别方法主要考虑对齐源域与目标域的数据分布,而未充分考虑目标域内的样本潜在约束。针对这一问题,本文提出了一个基于Mean-Teacher集成模型的学习框架,充分挖掘目标域无标签样本的样例集成一致性与跨粒度一致性约束。通过将两种一致性约束相结合,该集成模型可以在图像多粒度特征上充分挖掘无标签样本的潜在约束,从而增强目标域无监督特征学习。

 

5.基于三元组集成模型异质学习的跨域行人重识别方法。

自集成模型在解决半监督或域适应问题时取得了良好的效果,然而其深度耦合的模型结构限制了模型的描述能力。此外,自集成模型在解决无监督跨域行人重识别问题时,还受到目标域伪标签噪声的影响。考虑到上述问题,本文提出了一个三元组集成模型,即在Mean-Teacher模型的基础上额外引入一个学生网络,通过在两个学生网络之间进行异质学习实现知识交互,从而在模型中构建起闭环学习机制。这种闭环学习机制可以有效避免模型深度耦合,并且减少目标域伪标签噪声的影响。

Other Abstract

With the advancement of urbanization, a large number of surveillance cameras have been widely deployed in public places with dense pedestrians, forming a large-scale distributed surveillance network and providing massive video surveillance data. Traditional manual processing methods can no longer cope with the large-scale surveillance data. With the development of artificial intelligence, people try to use the technology of computer vision, machine learning and pattern recognition to build an intelligent surveillance system, which can quickly and accurately find effective information from the surveillance data. Person re-identification, as a key technique to achieve this goal, aims to match pedestrians under different camera views, so as to achieve fast pedestrian retrieval from large-scale video data. Person re-identification has attracted increasing research interest because of its important role in pedestrian trajectory tracking and action analysis. 

 

Although the current research on person re-identification has achieved significant performance based on deep learning, it is still faced with two major challenges in practical application: (1) Diversity of data within a domain. The data collected from the same camera network are diversified, due to the variation of pose, light condition and image resolution and cluttered background, which makes the pedestrians' appearance change greatly across camera views, and even the appearance similarity between different images of different pedestrians is larger than that of different images of the same pedestrian. (2) Difference of data distribution among different domains. The data collected from different camera networks are quite different due to the different shooting environment and camera configuration. These differences make person re-identification models perform very poorly across datasets. To address the above problems, In this thesis, we mainly study from two aspects, i.e., representation learning for the complex data within a domain and unsupervised domain adaptation for the difference of cross-domain data distribution, and propose a series of related methods.

 

The major contributions of this thesis are summarized are follows:

 

1. Pose-invariant representation learning for person re-identification.

Pose variation is the main factor that causes the change in pedestrians' appearance, and the position change of different body parts due to pose variation may lead to the problem of feature misalignment in pedestrian matching. To address this problems, we propose a Pose-invariant representation learning framework based on generative adversarial networks. Firstly, we disentangle the identity information and pose information in image with adversarial network, and generate pedestrian image under arbitrary pose. Secondly, the generated data are used for training the feature extraction network. For testing, the features of the original image and its corresponding generated image are fused, and the fused feature is more robust to pose variation.

 

2. Adaptive feature fusion for person re-identification.

Existing methods ignore the relationship between images when generating person representations.

To learn more robust global features by using the similarity relationship between the nearest neighbors, we propose an adaptive feature fusion method based on graph convolutional neural network. In this method, we first construct a graph structure based on a given sample and its nearest neighbors, and then perform graph convolution operation on the graph to achieve information propagation and feature fusion among nodes. The fused features are more robust and discriminative for complex and diversified pedestrian data because of considering the identity information from the nearest neighbors.

 

3. Structured representation learning for person re-identification.

The key for person re-identification is to generate robust and discriminative person representation.

Considering that existing methods suffer from cluttered background information while mining local information, and ignore the relationship among local features, a structured feature learning model is proposed. The model utilizes human semantic parsing method to segment different semantic regions of human body from the background, so as to avoid the interference of background information as much as possible. Then a graph neural network is used to model the relationship between the features of different semantic regions, and generate the structured feature. By combining the global feature, local semantic feature and structural feature,  we can further enhance the robustness of the final feature,and improve the performance of person re-identification.

 

4. Intra-domain consistency enhancing for cross-domain person re-identification.

Due to the difference of data distribution among different domains, the performance of person re-identification model will decline seriously on an unseen target dataset. Existing domain adaptation methods for person re-identification mainly try to align the data distribution of the source domain and the target domain, but do not fully utilize the potential constraints of unlabeled target samples. To address this problem, we propose an intra-domain consistency enhancing framework based on the Mean-Teacher model, which fully explores the constraints of instance-ensembling consistency and cross-granularity consistency of unlabeled samples in the target domain. By combining the two consistency constraints, the proposed model can fully mine the potential constraints of unlabeled samples on the multi-granularity features, and thus enhance the unsupervised feature learning in the target domain.

 

5. Heterogeneous learning in triplet ensemble model for cross-domain person re-identification.

The self-ensembling model has achieved good performance in semi-supervised or domain adaptation scenarios, but its tightly coupled model structure limits the description ability of the model. In addition, the self-ensembling model is affected by the noise of pseudo labels in target domain when applied to unsupervised cross-domain person re-identification. To address the above problems,

we propose a triplet ensemble model by adding an additional student network on the basis of the Mean-teacher model, and knowledge exchange is achieved by conducting heterogeneous learning between the two student networks. With an ensemble consistent constraint and the heterogeneous learning strategy, we can build a closed-loop learning mechanism in the model. The closed-loop learning mechanism can effectively avoid the the model to be tightly coupled, and overcome the negative effect of the noise of pseudo labels.

Pages156
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/44923
Collection模式识别国家重点实验室
Recommended Citation
GB/T 7714
李耀宇. 面向样本多样性的行人重识别方法研究[D]. 中国科学院自动化研究所. 中国科学院大学,2021.
Files in This Item:
File Name/Size DocType Version Access License
面向样本多样性的行人重识别方法研究.pd(5023KB)学位论文 开放获取CC BY-NC-SA
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[李耀宇]'s Articles
Baidu academic
Similar articles in Baidu academic
[李耀宇]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[李耀宇]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.