CASIA OpenIR
跨摄像头行人重识别问题研究与应用
曹敏
Subtype博士
Thesis Advisor彭思龙
2019-11-29
Degree Grantor中国科学院自动化研究所
Place of Conferral中国科学院自动化研究所
Degree Discipline模式识别与智能系统
Keyword行人重识别 度量学习 重排序 奇异问题 区域特定的度量函数 排序损失函数 群体信息 近邻样本
Abstract

      行人重识别旨在对同一行人由不同摄像机拍摄得到的行人图像进行匹配。对于实现视频监控系统智能化,行人重识别的自动化是其中必不可少的一个任务。伴随着视频监控系统的不断完善,行人重识别问题在计算机视觉和机器学习领域引起了研究人员的广泛关注。由于不同的摄像机拍摄到的行人存在姿态、光照、视角和背景等变化,造成了行人在不同摄像机下可能呈现不同的外貌特征,此外,同一摄像机拍摄到的不同行人可能具有相似的外貌特征,这些都导致行人重识别任务具有一定的挑战性。

      为了解决这些问题,实现一个具有高性能表现的行人重识别系统,研究人员作出了不懈努力。通常,特征提取和度量学习是行人重识别系统的关键步骤,研究人员针对其中一个步骤或两个步骤进行研究,近几年,为了进一步提高行人重识别的性能,重排序成为了行人重识别系统的一个重要步骤,获得了研究人员的关注。本文针对行人重识别系统中的度量学习和重排序步骤进行了详细研究,其主要工作和创新点归纳如下:

(1)对于实现行人重识别任务,一个有效的方法是通过同时最小化样本类内散度矩阵和最大化样本类间散度矩阵学习一个区分性良好的度量函数。然而,样本的特征向量的维数通常大于训练样本数量,从而导致样本类内散度矩阵是奇异矩阵,无法学习一个优良的度量函数。本文提出通过探索类内散度矩阵的伪逆,针对度量函数学习一个正交转化矩阵,来解决这个奇异问题。本文提出的方法具有两个优点:通过模型求解可以得到一个闭式解,且没有超参数需要调试。此外,针对行人重识别问题的非线性特征,本文发展了方法的核化版本;为了更高效的模型求解,本文发展了方法的快速版本。在实验中,本文作者验证了提出的方法对于解决奇异问题的有效性和优越性,分析了方法核化版本和快速版本的性能表现。在四个行人重识别数据集上的广泛对比实验,显示了提出的方法在性能方面的先进性。

(2)对于行人重识别系统中的度量学习步骤,其输入是行人图像的特征向量,研究人员基于行人重识别的目标-正确匹配的行人样本对在排序列表中排在所有样本对的第一位,构造一个目标损失函数,通过优化求解得到区分性良好的度量函数。行人图像的特征向量往往是基于图像的不同区域分块提取,伴随着这样的特征向量作为输入,大多数度量学习方法致力于学习一个区域通用的转换矩阵,也就是不同的区域特征共享一个同质转换,忽视了行人图像的空间结构和不同区域特征的分布差异性。因此本文提出了一个新颖的区域特定的度量学习方法,致力于学习区域特定的转换矩阵,分别对隶属于不同区域的子模型进行优化。而对于目标损失函数的构造,本文对行人重识别的目标进行数学语言的直接翻译,提出通过最小化正确匹配的行人样本对的距离与所有样本对中最小样本对的距离之间的差异,学习最优的特征映射函数。相比于其他目标损失函数,针对样本集的部分样本建模,本文提出的目标损失函数是通过最直接和直观的方式,对行人重识别中的排序目标进行建模,并且作用在了整个候选集样本上。通过在行人重识别数据集上的广泛实验,表明了本文提出的方法的有效性和先进性。

(3)为了进一步提高行人重识别的性能,本文基于个人信息得到的排序结果,有效地利用样本的上下文信息进行重识别。从时空域来看,群体成员可以提供视觉线索帮助行人重识别。本文探讨了基于群体的行人重识别方法的本质,提出了一个广义的群体定义。同时,本文提出了一个对匹配机制用于测量广义群体间的距离,对于跨摄像头群体成员位置发生变化具有鲁棒性。基于个人信息得到的样本对的匹配值加权其广义群体间的匹配值,得到最终的样本对的匹配值,完成行人重识别。从欧式空间看,通过探索样本在欧式空间中的近邻样本信息,样本对的匹配值基于数据流形结构的测地线路径得到了更精确的估计。因此本文提出样本对的相似性受它们的近邻样本信息影响,提出的方法是基于假设-如果样本对中一个样本的近邻样本信息与另一个样本相似,则样本对中的样本相似于彼此。通过在不同的行人重识别数据集上的实验,验证了提出的方法可以实现性能增强,并且相比于同类型的基于重排序的方法在性能和效率上都具有优越性。

Other Abstract

  Person re-identification (re-id) aims to identify designated individuals from a large amount of pedestrian images across non-overlapping camera views and and is a critical task for the realization of an intelligent video monitoring system. With the development of video monitoring system, person re-id is favored by the academe in the field of computer vision and machine learning in recent years. However due to the large intraclass variations caused by the change in illumination, person pose and occlusion across views, person re-id is a challenging problem. In addition, the similarity in appearance among different people further increase its difficulty in real applications.

  To address these challenges and achieve a high performance for the person re-id system, researchers have made continuous efforts. Most of existing methods focus on feature extraction and distance metric learning, which are the critical steps in person re-id system. In recent years, in order to further improve the performance of person reid, re-ranking as an important step has recently gained attention. In this paper, distance metric learning and re-ranking are studied in details. The main works and innovations of the dissertation are as follows:

(1) For achieving person re-id, an effective way is to learn a discriminative metric by minimizing the within-class variance and maximizing the between-class variance simultaneously. However, the dimension of sample feature vector is usually greater than the number of training samples, as a result, the within-class scatter matrix is singular and the metric cannot be learned. In this paper, we propose to solve the singularity problem
by employing the pseudo-inverse of the within-class scatter matrix and learning an orthogonal transformation for the metric. The proposed method can be effectively solved with a closed-form solution and no parameters required to tune. In addition, we develop a kernel version against non-linearity in person re-id, and a fast version for more efficient solution. In experiments, we prove the validity and advantage of the proposed method for solving the singularity problem in person re-id, and analyze the effectiveness of both kernel version and fast version. Extensively comparative experiments on four person re-id benchmark datasets, show the state-of-the-art results of the proposed method.

(2) For distance metric learning, its input data is the feature vectors extracted onseveral regions of person image, then an objective function is constructed and solved by iterative optimization algorithms to satisfy the constraints on samples of the training set, and obtain a discriminative distance metric. With the features extracted on several regions of person image, most of distance metric learning methods have been developed
in which the learnt cross-view transformations are region-generic, i.e all region-features share a homogeneous transformation. The spatial structure of person image is ignored and the distribution difference among different region-features is neglected. Therefore in this paper, we propose a novel region-specific metric learning method in which a series of region-specific sub-models are optimized for learning cross-view region-specific transformations. For the construction of the objective function, with the direct translation in math language for the goal of person re-id, we propose a novel metric learning method for person re-id to learn such an optimal feature mapping function, which minimizes the difference between the distance of matched pair and the minimum distance of all pairs, namely Ranking Loss. Compared with other loss functions, the proposed ranking loss optimizes the ultimate ranking goal in the most direct and intuitional way, and it directly acts on the whole gallery set efficiently instead of comparatively measuring in small subset. Extensive experiments on the datasets show the effectiveness of the proposed method compared to state-of-the-art methods.

(3) For enhance the performance of person re-id, in this paper we propose to utilize the contextual information of the sample for person re-id based on the initial ranking results obtained by the feature extraction and metric learning. For the view of the spatial-temporal domain, group members can provide visual clues for person re-id. For this, in this paper we discuss the essentials of group-based person re-id and relax the group definition towards a concept of “co-traveler set”. Accordingly we propose a pair matching scheme to measure the distance between co-traveler sets, which tackles the problems caused by dynamic change of group across camera views. The final individual matching score is weighted by the obtained distance measurements between co-traveler sets. For the view of the Euclidean space, the pairwise similarity can be computed more accurately by taking into account the contextual information of sample and the structure of the dataset manifold. For this, we propose a context-driven person re-id method in which the pairwise measure is determined by their contextual information provided through its neighbor samples. The main motivation of the proposed method relies on the conjecture that two sample are similar to each other if the contexts of them are similar to each other. Experiments were conducted on different person re-id datasets and shows the promising improvement of the proposed method compared with state-of-theart re-ranking methods.

Pages98
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/26239
Collection中国科学院自动化研究所
智能制造技术与系统研究中心
智能制造技术与系统研究中心_多维数据分析
Corresponding Author曹敏
Recommended Citation
GB/T 7714
曹敏. 跨摄像头行人重识别问题研究与应用[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2019.
Files in This Item:
File Name/Size DocType Version Access License
跨摄像头行人重识别问题研究与应用.pdf(13205KB)学位论文 开放获取CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[曹敏]'s Articles
Baidu academic
Similar articles in Baidu academic
[曹敏]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[曹敏]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.