跨分辨率行人重识别研究

CASIA OpenIR > 毕业生 > 博士学位论文

	跨分辨率行人重识别研究
	韩苛
	2023-05-18
页数	112
学位类型	博士
中文摘要	行人重识别的目标是跨摄像头匹配同一个人在不同地点的图片。随着公共场所的视频监控越来越多，行人重识别在寻找嫌疑人、人员跟踪等方面展现出巨大的应用潜力，并且受到学术界和工业界的广泛关注。目前，标准的行人重识别研究通常假设所有行人图片都具有足够且相似的分辨率，然而这在真实场景中并不总是成立。由于行人和相机之间的距离无法控制，行人图片的分辨率可能存在很大差异，使得标准行人重识别模型出现严重的性能下降。为了解决这个问题，跨分辨率行人重识别研究被提出。该研究旨在克服图片分辨率差异提升行人重识别性能，对于推动行人重识别技术的实际应用具有重要的研究价值和意义。一般地，登记在图片库的目标人物图片具有较高的分辨率，而监控摄像头拍摄的查询人物图片具有较低的分辨率。为了减小这些图片间的分辨率差异，许多现有的跨分辨率行人重识别方法使用超分辨率模块来放大低分辨率图片，恢复丢失的细节，然后将超分辨率图片与其他高分辨率图片进行匹配。本研究以此为基础，针对当前方法存在的缺陷，层层递进地开展了如下三项工作。 1.当前的方法在对低分辨率图片进行超分辨率时通常为所有图片指定一个尺度因子。然而，使用不同的尺度因子可能会导致非常不同的恢复和识别效果。比如，较大的尺度因子可以恢复更多的细节，但可能会产生很多噪声，而较小的尺度因子恢复的细节有限，但可以较好地保留原图内容。为了利用这种不同尺度因子的互补性，本研究提出了一个多尺度超分辨率和人体部位融合模型。该模型根据生成的视觉内容联合训练和融合多个具有不同尺度因子的超分辨率模块，以端到端的训练方式充分补偿和学习互补的身份特征。为了提高融合过程中对生成的噪声的鲁棒性，该模型进一步通过划分和融合生成的身体部位学习有用的局部特征。通过自适应融合基于多尺度超分辨率的全局特征和基于人体部位的局部特征，该模型有效地提升了行人识别精度。 2.以上的多尺度超分辨率融合方法虽然弥补了分辨率不同导致的信息量差异，但需占用大量计算资源和推理时间。能否从多尺度中预测一个最优的尺度，提升基于单尺度超分辨率的恢复和识别效果，是一个具有研究价值的问题。为此，本研究提出了一个自适应尺度因子预测模型，进一步对第一项工作提出的多尺度超分辨率图片内容进行度量，从中为每一张低分辨率图片找到一个相对最优的尺度因子。为了解决缺乏真实最优尺度因子标注的问题，该模型包含一个自监督的尺度因子度量机制，通过比较多尺度超分辨率图片内容生成动态软标签，指示每个尺度因子是最优尺度的概率，并用来监督基于内容感知的尺度因子预测。在推理时，该模型可以为每张低分辨率图片预测一个较优的尺度因子来提升行人识别精度。 3.以上方法总是借助额外的恢复过程来处理低分辨率图片，忽略了学习低分辨率图片本身的特征表示，所以导致推理过程非常耗时。为了解决这个问题，本研究采用用超分辨率图片特征“指导”，而不是“代替”低分辨率图片特征的思路，为跨分辨率行人重识别提出了一个超分辨率指导的特征增强模型。该模型首先利用第二项工作提出的尺度因子度量机制，为给定的低分辨率图片找到最优的尺度因子，恢复最具辨别力的超分辨率图片，然后用其做指导，使原始的低分辨率图片特征可以学习和靠近对应的超分辨率图片特征。通过这种方式，该模型可以充分地从低分辨率图片中学习更具判别力的行人特征表示，从而在推理时直接处理低分辨率图片，而无需执行超分辨率过程。该模型不仅取得了较高的识别精度，而且缩短了推理时间。本研究在多个跨分辨率行人重识别数据集上进行了广泛的实验和分析，证明了上述模型在提升行人识别准确率上的有效性和优越性。
英文摘要	Person re-identification (Re-ID) aims to match the images of the same person across cameras distributed at different locations. With more and more video surveillance in public places, Re-ID has shown a great application potential, such as in searching for suspects, person tracking, etc, and attracted wide attention of both academia and industry. Currently, standard Re-ID research usually assumes that all the available person images have sufficient and similar resolutions, which nevertheless does not always hold in real-world scenarios. The resolutions of captured persons probably vary largely due to the uncontrollable distances between walking persons and cameras, making standard Re-ID models suffer the severe performance drop. To address this problem, the research on cross-resolution person Re-ID is proposed. It aims to overcome the resolution gap to improve the Re-ID performance, and has an important research value and significance in promoting the real application of Re-ID. Generally, target persons of high resolution (HR) are enrolled as the gallery set, while query persons captured by surveillance cameras have low resolution (LR). To bridge the resolution gap between these images, many existing cross-resolution Re-ID works employ a super-resolution (SR) module to upscale an LR image and recover the missing details, and then match the SR images with other HR gallery images. Based on this idea and the defects of current methods, this research has progressively carried out three pieces of works as follows. 1. When super-resolving LR images, current methods usually specify one single scale factor for all the images. However, using different scale factors may lead to very different recovery and Re-ID results. For example, a larger scale factor recovers more details but may produce much noise, while a smaller one recovers limited details but can maintain the original content better. To exploit the complementary property of different scale factors, a Multi-Scale Super-Resolution and Body Part Fusion (SSBF) model is proposed. This model jointly trains and fuses multiple SR modules with different scale factors based on their generated visual contents, to fully compensate and learn the complementary identity features in an end-to-end training manner. To improve the robustness to the generated noise during fusion, this model further learns informative local features by dividing and integrating the generated body parts. Through adaptively fusing the multi-scale super-resolution based global features and body parts based local features, the model improves the person Re-ID accuracy effectively. 2. Although the above multi-scale super-resolution fusion method can bridge the information gap caused by the resolution gap, it need massive computation resources and inference time. It is of high research value to predict an optimal one from multiple scale factors to improve the single-scale super-resolution recovery and Re-ID results. To this end, this research proposes an Adaptive Scale Factor Prediction (ASP) model, which further measures the multi-scale super-resolution contents in the first work, and finds out a relatively optimal scale factor for each LR image. To deal with the lack of ground-truth optimal scale factors, the model contains a self-supervised scale factor metric that automatically generates dynamic soft labels by comparing multi-scale super-resolution contents. The dynamic soft labels indicate probabilities that each scale factor is optimal, which are then used as supervision for the content-aware scale factor prediction. During inference, this model can adaptively predict a preferable scale factor for each LR image to improve the person Re-ID accuracy. 3. The above methods always resort to the extra recovery to handle LR images and neglect learning LR representations themselves, which is quite time-consuming during inference. To tackle the problem, this research adopts the idea of using SR image features to “guide” instead of “replacing” LR image features, and proposes a Super-Resolution Guided Feature Enhancement (SGE) model for cross-resolution Re-ID. Based on the scale factor metric in the second work, this model first recovers the most discriminative SR image for a given LR image by finding out the optimal scale factor, and further leverages them as guide to make original LR representations learn and approach the corresponding SR representations. Through this way, the model can fully learn more discriminative representations from LR images, so that it is able to directly handle LR images without requiring recovery during inference. The model not only achieves the high person Re-ID accuracy, but also reduces inference time. This research has conducted extensive experiments and analyses on multiple cross-resolution Re-ID datasets, demonstrating the effectiveness and superiority of the above models.
关键词	行人重识别低分辨率超分辨率尺度因子特征融合
学科领域	计算机科学技术 ; 人工智能
学科门类	工学 ; 工学::控制科学与工程
语种	中文
七大方向——子方向分类	图像视频处理与分析
国重实验室规划方向分类	视觉信息处理
是否有论文关联数据集需要存交	否
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/52173
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	韩苛. 跨分辨率行人重识别研究[D],2023.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
博士毕业论文.pdf（10784KB）	学位论文		限制开放	CC BY-NC-SA