基于语义增强与边界感知的交通场景下目标检测方法研究

CASIA OpenIR > 毕业生 > 博士学位论文

	基于语义增强与边界感知的交通场景下目标检测方法研究
	王晓莲
	2022-08-19
页数	132
学位类型	博士
中文摘要	随着交通监控系统的普及与自动驾驶技术的发展，交通场景下的图像视频数据呈现爆炸式的增长，需要依赖计算机视觉技术来对海量数据进行快速分析与处理。目标检测是众多高层视觉任务的基础，从图像视频中准确地识别出各类交通目标并获取它们的位置，可以为下游任务如道路异常检测、轨迹跟踪及目标重识别等提供重要的基础信息，因此设计准确的目标检测算法对于智能交通的发展、公共安全的维护等方面具有重要意义。近年来随着深度学习的发展，通用目标检测技术取得了巨大进展，但交通场景下的目标检测仍面临一系列的问题：环境中丰富的背景物体会带来大量的干扰信息，易产生背景误检；单阶段检测器难以应对复杂场景下训练样本发生的属性变化，易产生物体漏检；常用的回归式检测模型缺少对空间特征的约束与学习，物体定位精度有限。针对上述问题，本文从特征语义增强和物体边界感知两个角度进行研究，通过引导检测器进行良好的特征学习来改善目标检测性能。本文主要的研究内容和创新贡献如下： 1、针对交通监控场景下复杂背景带来的信息干扰问题，提出了一种运动语义增强的车辆检测方法。该方法的核心思想是利用区别于静止背景的物体运动信息来增强前景物体特征并抑制背景，但其中的难点在于如何在利用运动语义进行注意力引导的同时避免静止前景目标被当作背景而受到错误抑制。为此模型采用了一种包含表观分支与运动分支的双分支并行结构，首先将运动语义的嵌入与对特征的作用进行分离；然后在双分支间构建交叉注意力，将运动分支中经运动语义增强的高响应前景特征传递到表观分支，引导后者进行高效的全前景目标学习。在公开数据集上的实验结果表明，提出的方法可以在缓解背景信息干扰的同时有效提高网络对全前景目标的感知能力，使模型的检测性能得到了改善。 2、针对单阶段检测器负训练样本发生属性变化、影响模型学习有区分力的前景/背景特征的问题，提出了一种基于一致性负样本挖掘的目标检测训练方法。该方法的核心在于提出了一种一致性负样本挖掘损失来调整样本的训练注意力，从而改善目标检测器的优化过程。首先根据负样本锚框的无监督回归表现，从负样本集合中挖掘出稳定具有背景属性的一致性负样本，剩余表现出前景属性的负样本则为有偏负样本；然后根据负样本的背景类信息熵修正原分类损失函数，通过降低有偏负样本的损失权重削弱其训练注意力，减少模型在前景/背景特征学习上的冲突。在多个数据集上的实验结果表明，提出的方法可以有效改善提取的前景与背景特征并提高模型的检测性能。 3、针对回归式检测器空间细节感知能力较弱、一维坐标监督无法充分约束模型对空间特征进行有效学习的问题，提出了一种基于物体边界感知与边框进化的目标检测模型。该方法的核心在于将一维定位监督扩展到二维图像空间中，通过约束模型在图像域上学习物体边界敏感的空间信息，使模型显式感知物体边界并关注边界特征的学习。首先借鉴主动轮廓模型构建边框能量泛函，其中的能量项由卷积神经网络学习生成，从而将定位监督引入二维图像空间；然后将学得的边界敏感信息与模型的回归分支耦合，利用能量泛函的梯度更新来指引回归框不断进化到物体边缘。在多个公开数据集上的实验结果表明，空间信息与回归分支的联合学习可以有效改善模型提取的基础定位特征，提高模型的定位精度。
英文摘要	With the popularization of traffic surveillance system and the development of autonomous driving technology, image and video data in traffic are growing explosively. It is necessary to apply computer vision methods to quickly analyze and process the massive data. Object detection is the basis of many high-level visual tasks. Accurately recognizing various traffic objects and obtaining their positions from images and videos can provide important basic information for downstream tasks such as road anomaly detection, trajectory tracking and object re-identification. Therefore, designing accurate object detection algorithms is of great significance for the development of intelligent transportation and the maintenance of public safety. In recent years, general object detection has made great progress with the development of deep learning, but object detection in traffic scenes still faces a series of problems: rich background objects in the environment bring a lot of interference information, which makes detectors prone to false positives in the background; it is difficult for single-stage detectors to deal with the property change of training samples in complex scenes, which makes detectors prone to missed detections; the commonly used regression-based detection models lack constraints and learning of spatial features, which makes the localization accuracy limited. In response to above problems, this thesis focuses on the feature semantic enhancement and the object border perception, so as to improve detection performance through good feature learning. The main contributions of this thesis are summarized as follows: 1. To mitigate interference of complex background in traffic surveillance, a vehicle detection method based on motion semantic enhancement is proposed. The core idea of this method is to use objects’ motion information that is distinguished from static background to enhance foreground features and suppress background features. The difficulty lies in how to use motion semantic for attention guidance while avoiding static foreground objects being incorrectly suppressed as the background. To cope with this difficulty, the proposed model adopts a two-branch parallel structure containing Appearance Branch and Motion Branch to first separate the embedding of motion semantic from its effect on features. Then cross-attention is constructed between the two branches, and high response foreground features enhanced by motion semantic in Motion Branch are transferred to Appearance Branch to guide the latter one to efficiently learn all foreground objects. Experimental results on public datasets show that the proposed method can effectively improve the model perception of overall foreground objects while mitigating interference of background, leading to improved detection performance. 2. To alleviate the problem that in single-stage detectors, the property of some negative samples changes in training, which interferes with model’s foreground / background feature learning, an object detection training method based on consistent negative sample mining is proposed. The core of this method is to propose a consistent negative sample mining loss to adjust the training attention of samples, so as to improve the optimization process of object detectors. Firstly, according to unsupervised regression performance of negatives, we mine consistent negatives with stable background property from the negative set, and the remaining negatives with foreground property are biased negatives. Then the original classification loss is modified according to background information entropy of negatives. By reducing loss weights of biased negatives, their training attention can be weakened, thus the conflict in the foreground / background feature learning can be reduced. Experimental results on multiple datasets show that the proposed method can effectively improve the learning of foreground and background features as well as improve detection performance. 3. In view of the weak spatial perception of regression-based detectors and the limitation of one-dimensional coordinate supervision in constraining models to learn spatial features, a border-aware and box-evolving based detector is proposed to improve the precision of object localization. The core of the proposed method is to extend the one-dimensional localization supervision to two-dimensional image space, and by constraining the learning of spatial details sensitive to object borders in the image space, the model can explicitly perceive object borders as well as focus on their feature learning. Firstly, an energy functional of bounding box is constructed motivated by Active Contour Models. The energy terms are learned and generated by convolutional neural networks, thus the location supervision is introduced into the two-dimensional image space. Then the learned border-sensitive information is coupled with the regression branch, and the gradient updates of the energy functional guide regressed boxes to evolve towards object borders. Experiments on several public datasets show that the joint learning of spatial information and the regression branch effectively improves the basic localization features extracted by the model, leading to improvement in localization precision.
关键词	目标检测深度卷积神经网络运动语义一致性负样本边界感知与边框进化
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/49690
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	王晓莲. 基于语义增强与边界感知的交通场景下目标检测方法研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2022.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
基于语义增强与边界感知的交通场景下目标检（26080KB）	学位论文		限制开放	CC BY-NC-SA