CASIA OpenIR
基于单步检测器框架的人脸和行人检测方法研究
庄楚斌
Subtype硕士
Thesis Advisor雷震
2020-06
Degree Grantor中国科学院自动化研究所
Place of Conferral中国科学院自动化研究所
Degree Discipline模式识别与智能系统
Keyword人脸检测 行人检测 实时性 单步检测器
Abstract

智能视频监控作为一项重要的安全监控手段,旨在利用计算机处理技术为人们提供丰富、精准的视频图像分析数据,克服传统人工监控的低效率问题,实现对公共场所更加安全有效的监控与管理。人脸和行人作为智能视频监控的两个主要研究对象,由于其数据信息的易获取性和潜在的实用价值,使得利用图像处理技术对监控场景中的人脸和行人进行高效且精准的检测成为一项重要的研究课题,对于推动相关产业的发展有着极为重要的研究意义和实用价值。

人脸和行人检测是指对输入的图像数据进行处理,寻找并标记出人脸和行人所在位置的过程。在实际应用中,由于受到姿势,角度和光照等因素的影响,传统的检测方法效果并不是很理想。近年来,随着人工智能技术的发展,更多高精度的检测器都是基于深度神经网络发展得来的,这些算法尽管拥有着较高的检测精度,但算法复杂度偏高,耗时较大,无法满足实际应用中的实时性需求。针对该问题,本文基于单步检测器框架设计了一个高效率的检测模型以保证算法具有较高的实时性,并针对人脸和行人检测中所存在的特定问题,进行了相关算法的优化与设计,以满足实际应用的需求。

具体来说,本文基于单步检测器SSD框架进行模型的搭建,引入多步检测器设计中多阶段回归的思想对预测窗口进行由粗到细的两阶段回归调整。在保留单步检测器速度优势的同时,有效解决了算法精度不高的问题。此外,为了进一步提升检测器的实时性能,本文设计了一个快速特征提取网络,对输入的图像进行高效的特征提取,从而保证算法具有更高的处理速度。针对人脸检测存在的尺度变化大,小尺度人脸不易检测等问题,1)设计了特征融合模块对多尺度特征图进行增强处理;2)在第二阶段的窗口回归过程中引入了注意机制来强化可能存在人脸的区域特征,减少背景特征的干扰,强化模型对于复杂场景下多尺度人脸的检测能力;3)提出关键点锚点框的设计来实现人脸和关键点的同步预测输出,从而扩展人脸检测模型的潜在应用价值。针对行人检测存在的正负样本比例失衡和行人之间互相遮挡等问题,1提出了一个新的窗口回归损失函数,用于帮助模型进行更精准的行人定位;2在正负样本生成阶段引入软标签的思想,充分利用了处于正负样本阈值之间的边界样本,增加了正样本的同时,有效提升了模型对边界样本的鲁棒性;3对现有的非极大值抑制算法和锚点框匹配算法进行了优化,使其更加适配于遮挡场景下的行人检测。

本文针对监控场景中的实时人脸和行人检测问题,设计出高效而鲁棒的人脸、行人检测算法以满足实际应用中的实时性和准确性需求,有着极为广泛而重要的研究意义和实用价值,对于推动智能视频监控,自动化驾驶等相关应用产业的发展有着深远的意义。

Other Abstract

As an important means of security monitoring, intelligent video surveillance aims to provide people with rich and accurate video image analysis data, overcome the low efficiency of traditional manual monitoring, and realize safer and more effective management of public places by using computer technology. As two main research objects in the field of intelligent video surveillance, both face and pedestrian have the good characteristics of easy access to data and high potential practical value, making use of image processing technology to efficient and accurate face and pedestrian detection is of great importance to the development of security monitoring.

Face detection and pedestrian detection refer to the process of processing the input image data to find and mark the location of face and pedestrian. In practice, the traditional detection method is not very effective due to the influence of posture, angle and light. In recent years, with the development of artificial intelligence technology, more high-precision detectors are developed on the basis of deep neural network. Although these algorithms own high accuracy, their time consumption are too high to meet the real-time requirements. In view of this problem, this paper designs an efficient detection model based on the framework of single shot detector to ensure high real-time performance of the algorithm, and optimizes and designs relevant modules for specific problems in face detection and pedestrian detection, so as to meet the requirements of practical application.

Specifically, this paper proposes a real-time detection model based on the framework of single shot detector SSD, and introduces the idea of multi-step regression in the two-stage detector to carry out the two-step regression adjustment from coarse to fine for the predicted bounding boxes. While retaining the advantage of speed, the problem of low precision of the detector is effectively solved. In order to further improve the real-time performance of the detector, this paper further designs a lightweight feature extraction network for effective feature extraction of the image data, so as to ensure the higher processing speed of the algorithm. For face detection task, in order to solve the problem of large-scale variation and efficient detection of small-scale faces, 1) The feature fusion module is firstly designed to enhance multi-scale feature maps; 2) In the second step of bounding box regression, attention mechanism is further introduced to enhance the regional features of faces, so as to reduce the interference of background features and enhance the detection ability of the model for multi-scale faces in complex scenes; 3) The design of landmark anchor is also proposed to realize the joint prediction of face and landmark, expanding the potential value of the model. For pedestrian detection task, in order to solve the imbalance between positive and negative samples and efficient detection of occluded pedestrians, 1) A novel bounding box regression loss is firstly proposed to improve the accuracy of pedestrian location; 2) The design of soft label is introduced to the sample generation process to make full use of the boundary samples between positive and negative thresholds, which increases the valid samples and effectively improves the robustness of the model; 3) The existing non-maximum suppression algorithm and anchor matching algorithm are optimized to make them more suitable for occluded pedestrian detection.

For the problem of real-time face and pedestrian detection in video surveillance scenario, this paper proposes efficient and robust face detection and pedestrian detection algorithms to meet the real-time and accuracy requirements of practical application, which have extensive and far-reaching significance to the development of related applications, such as intelligent video surveillance and automated driving industry.

Pages75
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/38538
Collection中国科学院自动化研究所
Recommended Citation
GB/T 7714
庄楚斌. 基于单步检测器框架的人脸和行人检测方法研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2020.
Files in This Item:
File Name/Size DocType Version Access License
基于单步检测器框架的人脸和行人检测方法研(3843KB)学位论文 开放获取CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[庄楚斌]'s Articles
Baidu academic
Similar articles in Baidu academic
[庄楚斌]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[庄楚斌]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.