基于深度学习和传统方法相结合的行人检测研究

CASIA OpenIR > 毕业生 > 硕士学位论文

	基于深度学习和传统方法相结合的行人检测研究
	童贝
	2017-05-26
学位类型	工学硕士
中文摘要	目标检测是计算机视觉领域中长期关注的问题，而行人检测是目标检测的典型问题，在无人驾驶、智能监控、智能机器人等领域中都具有重要应用价值。历经十多年的探索，行人检测技术进步迅猛，出现了一系列的经典算法，特别是近年来，随着深度学习方法的引入，行人检测模型的精度得到了显著的提升。深度学习方法与传统算法相比，其优点是无需设计判别性特征、精度高，但同时也存在速度慢、对硬件要求高缺点而制约了其技术的产品化，这促使各类兼具速度和精度的网络模型应运而生。尽管行人检测技术日趋成熟，在当前研究中，仍旧存在诸多待解决的问题：其一，在道路场景中行人目标小，大多数行人检测框架对于小目标问题仍难以解决；其次，道路交通环境复杂，行人遮挡问题严重，干扰目标众多；其三，行人常有诸多的附加属性，如帽子、包、行李箱等，这在一定程度上增加了行人检测的复杂性。本论文针对以上问题，对传统行人检测方法和深度学习方法进行了研究和拓展，主要贡献有： 1. 针对Faster R-CNN框架难以解决小目标问题，提出了一种基于Faster R-CNN的端到端多尺度模型。该框架不需依赖额外的目标框检测算法和特征提取方式，自主地依据候选框大小学习和选择不同尺度的特征信息。网络结构简洁，易于实现和拓展。 2. 对Faster R-CNN进行了拓展，引入Neural Cascade结构以解决难例选择和假阳性样本过多的问题。利用多层感知机构造多个弱分类器，并组合成强分类器，来预先判断可用候选框。利用选择的难例来训练网络，在降低网络计算量的同时使检测精度进一步改善。 3. 综合考虑和分析了多种网络结构，并进行了一系列的大量实验对其性能进行评测。在一些公开的行人检测数据集上，本文提出的端到端多尺度模型在测试时间和相应的评估指标方面取得了较好的效果。同时，我们也对关键性的实验技巧也进行了总结与分析。
英文摘要	Object detection is an enduring topic in the field of computer vision. As a typical task of object detection, pedestrian detection has significant practical applications in the area of security, autonomous driving, intelligent surveillance and robotics applications. After more than ten years of exploration, pedestrian detection technologies have greatly developed and a series of classic algorithms have been proposed. Especially in recent years, with the introduction of the deep learning methods, the pedestrian detection models have gained surprising improvement. Despite deep learning based models can achieve good performance without designing hand-crafted features, low efficiency and high hardware requirement may limit their practical use. As a consequence, various deep network models which make compromises between speed and accuracy are proposed. Although excellent performance and efficiency has been achieved, pedestrian detection still has a long way to go for the following reasons. Firstly, the pedestrian objects in the road scenes are too small to contain enough feature points. However, most pedestrian detection frameworks perform poorly in detecting small targets. Secondly, the road traffic environment is complex and the pedestrian occlusion problem caused by various interference is serious. Thirdly, pedestrians always have many additional attributes, such as hats, bags, suitcases, etc, which increase the complexity of pedestrian detection. In this paper, we study both traditional methods and deep learning models in pedestrian detection and integrate them to provide new solution for this task. Overall, the main contributions of this paper can be summarized as follows: 1. In this paper, an end-to-end multi-scale model based on Faster R-CNN is proposed to solve the problem of small target problem in Faster R-CNN framework. The model is free of external proposal detection and feature extraction methods and capable of learning and selecting different scales of feature information based on the size of the candidate boxes. The structure of our model is brief and easy to implement and extend. 2. The model is extended on the basis of Faster R-CNN. A neural cascade structure is proposed to solve the problem of hard negative examples mining and false positive samples. A strong classifier composed of multiple MLP (multi-layer perceptron) based weak classifiers is constructed and applied to judge the available candidate boxes in advance. Meanwhile, we train the network with selected hard examples, which relieves the computational burden and improve the accuracy as well. 3. A variety of network structures are explored and analyzed, and we conduct a series of experiments to evaluate the performance. The end-to-end multi-scale model presented in this paper is fast and performs well in the corresponding evaluation methods of certain open pedestrian detection data sets. Some crucial tricks used in experiments are summarized and analyzed as well.
关键词	行人检测深度学习 Faster R-cnn 级联分类器 Acf
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/14638
专题	毕业生_硕士学位论文
推荐引用方式 GB/T 7714	童贝. 基于深度学习和传统方法相结合的行人检测研究[D]. 北京. 中国科学院研究生院,2017.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
template.pdf（22597KB）	学位论文		限制开放	CC BY-NC-SA