面向行人再辨识的多层特征提取和度量学习算法研究

CASIA OpenIR > 毕业生 > 博士学位论文

	面向行人再辨识的多层特征提取和度量学习算法研究
	杨阳
	2016-05
学位类型	工学博士
中文摘要	行人再辨识是多摄像机视频监控中的一个重要任务，其目标在于将多路摄像机在不同时间和地点捕捉到的行人进行自动关联，为行人目标的行为和活动推理分析提供基础。在视频监控中，不同光照、拍摄视角以及行人姿态变化，会导致同一个行人在不同摄像机下的表观差异很大，因此，行人再辨识任务具有很大的挑战性。目前，行人再辨识问题的分析处理主要从以下两方面解决：（1）基于行人的表象信息提取合适的特征，以凸显每个人的差异并且对光照和视角的类内变化具有鲁棒性；（2）设计或学习有效的相似性度量用于计算两幅图像之间的相似度。本文也从这两个方向入手，对多层特征提取和度量学习算法进行深入研究。论文主要贡献如下：（1）分析了不同颜色空间对光照变化的鲁棒性。综合考虑颜色空间的判别性和光照不变性并选择不同的颜色空间进行融合，以改善单一颜色空间对光照变化敏感的问题。另外，在计算机视觉应用领域，颜色名可以帮助人们从语义的角度对图像进行分析。鉴于此，提出了一种基于显著颜色名的色彩描述子SCNCD。该描述子利用16种常用颜色名对原始颜色空间中的256$\times$256$\times$256个离散颜色点重新描述，更简洁地表示一个区域里的颜色分布。基于SCNCD的特征比传统的直方图特征对光照变化具有更好的鲁棒性，并且与直方图特征进行融合能进一步提升行人再辨识的准确性。（2）受到词袋模型的启发，提出了一种面向高层特征学习的有监督编码策略。该方法利用训练数据集中行人之间的关系（相似或不相似），学习一个具有判别性词典并构建度量嵌入的高层特征表示，使得相同的行人比不同的行人距离更近近。与原始低层特征相比，高层特征更具有判别性并且提高了行人再辨识性能。（3）由于基于单层特征方法的特征表达能力具有一定局限性，为提升特征的表达能力，提出了一种加权线性编码（WLC）方法，从原始图像数据中学到相应的像素层，块层以及图像层的描述子。由于WLC可以将同一区域的所有数据进行编码，所以可以保持空间的一致性。此外，该方法的像素层特征表示了颜色信息，块层对应于局部形状模式而图像层提供了行人的整体信息。因此，为了充分利用不同层特征的补充性，采用了一种分数层融合策略使得行人再辨识的准确性得以大幅提高。（4）度量学习是计算一对样本之间相似关系的通用方法。但是现有的度量学习方法仅仅考虑一对样本之间的差异性，在处理复杂方面数据存在一定局限性。为了克服这种局限性，提出了一种基于相似对大尺度相似度学习方法（LSSL）——同时考虑样本对的差异性和共性。为了学习相应的度量矩阵，对所有行人样本作成对约束的高斯假设。基于该假设，不同行人样本对的先验信息可以通过相同行人样本对的先验信息获得。该方法在保证行人再辨识和人脸验证效果的同时，提高了算法效率。总之，本论文从特征提取和度量学习的角度出发，在解决视频监控中的行人再辨识问题上做了有益的工作，对后续的研究具有一定的借鉴意义。
英文摘要	Person re-identification can be considered as an essential task in the video surveillance of distributed multi-cameras. Its objective is to automatically associate the same person captured by distributed multi-cameras at different locations and time, which contributes to the analysis on the individual-specific long-term behaviors and activities. In the video surveillance, appearances of the same person in different cameras may exhibit drastic variations caused by different illumination, view angles and poses, which makes it challenging to tackle the task of person re-identification. Now, the strategies of analyzing and handling the problem of person re-identification mainly focus on the following two aspects: (1) how to extract suitable appearance based feature representation which is distinct for each person and robust to viewpoints and illumination; (2) how to design or learn effective similarity measure to compute the similarity between two persons from the available training samples. From both directions, in this dissertation, research of multi-level feature extraction and metric learning is made. The main contributions are as follows: (1) This dissertation analyzes the robustness of different color spaces against illuminations. Based on their illumination invariance and distinctiveness, several color spaces are selected and fused to improve the problem of the sensitivity of single color space to illumination. Additionally, color names can help people with a semantic analysis of images in many computer vision applications. As such, this dissertation proposes a novel salient color names based color descriptor (SCNCD) to describe colors. It utilizes 16 commonly used color names to reformulate the 256256256 pixels in the original color space. Then, color distributions in a region can be described in a more succinct manner. Features based on SCNCD show more robustness to illumination than color histogram. When it is fused with color histogram, the accuracy of person re-identification is enhanced. (2) Inspired by bag-of-word model, this dissertation proposes a new supervised strategy of learning high-level features. The proposed method can explore the relationship (same or not) among the persons. To be specific, it is to learn a discriminative vocabulary from the training samples, which makes the same persons closer than different ones in the metric embedded higher-level features. In comparison with original low-level features, the obtained higher-level features are more discriminative and show better performances in person re-identification. (3) Since single level features of persons will degrade the representation power of features, this dissertation proposes a weighted linear coding (WLC) to increase the description power of features, which learns corresponding descriptors that contains multi-levels: pixel-level, patch-level and image-level, respectively. Benefiting from the advantage of jointly encoding all data from the same region, WLC is able to preserve the spatial consistency. In addition, pixel-level represents color information, patch-level corresponds to local shape patterns, and image-level provides entity information. To fully exploit the performances of features in different levels, a score-level fusion strategy is used in this dissertation. By doing so, the accuracy of person re-identification is improved significantly. (4) Metric Learning is commonly used to describe the similarity of a sample pair. However, most existing metric learning methods, which only take the difference of a pair, have the limitations for solving the heterogeneous data. To overcome the limitation, this dissertation presents a learning based on large scale similarity learning (LSSL) using similar pairs. LSSL considers both the difference and the commonness. To learn the corresponding metric matrix, a pair-constrained Gaussian assumption is made for all samples of persons. Under this assumption, priors of dissimilar pairs can be obtained from those of similar pairs. It ensures the accuracy of face verification and person re-identification while at the same time the efficiency of algorithm is promoted. In summary, in this dissertation, a lot of conductive work is done on research of feature extraction and metric learning for person re-identification, which will benefit the further research work.
关键词	行人再辨识特征提取度量学习显著颜色名判别性词典
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/11813
专题	毕业生_博士学位论文
作者单位	中国科学院自动化研究所
第一作者单位	中国科学院自动化研究所
推荐引用方式 GB/T 7714	杨阳. 面向行人再辨识的多层特征提取和度量学习算法研究[D]. 北京. 中国科学院研究生院,2016.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
面向行人再辨识的多层特征提取和度量学习算（14947KB）	学位论文		限制开放	CC BY-NC-SA