With the popularization of traffic surveillance system and the development of autonomous driving technology, image and video data in traffic are growing explosively. It is necessary to apply computer vision methods to quickly analyze and process the massive data. Object detection is the basis of many high-level visual tasks. Accurately recognizing various traffic objects and obtaining their positions from images and videos can provide important basic information for downstream tasks such as road anomaly detection, trajectory tracking and object re-identification. Therefore, designing accurate object detection algorithms is of great significance for the development of intelligent transportation and the maintenance of public safety.
In recent years, general object detection has made great progress with the development of deep learning, but object detection in traffic scenes still faces a series of problems: rich background objects in the environment bring a lot of interference information, which makes detectors prone to false positives in the background; it is difficult for single-stage detectors to deal with the property change of training samples in complex scenes, which makes detectors prone to missed detections; the commonly used regression-based detection models lack constraints and learning of spatial features, which makes the localization accuracy limited. In response to above problems, this thesis focuses on the feature semantic enhancement and the object border perception, so as to improve detection performance through good feature learning. The main contributions of this thesis are summarized as follows:
1. To mitigate interference of complex background in traffic surveillance, a vehicle detection method based on motion semantic enhancement is proposed. The core idea of this method is to use objects’ motion information that is distinguished from static background to enhance foreground features and suppress background features. The difficulty lies in how to use motion semantic for attention guidance while avoiding static foreground objects being incorrectly suppressed as the background. To cope with this difficulty, the proposed model adopts a two-branch parallel structure containing Appearance Branch and Motion Branch to first separate the embedding of motion semantic from its effect on features. Then cross-attention is constructed between the two branches, and high response foreground features enhanced by motion semantic in Motion Branch are transferred to Appearance Branch to guide the latter one to efficiently learn all foreground objects. Experimental results on public datasets show that the proposed method can effectively improve the model perception of overall foreground objects while mitigating interference of background, leading to improved detection performance.
2. To alleviate the problem that in single-stage detectors, the property of some negative samples changes in training, which interferes with model’s foreground / background feature learning, an object detection training method based on consistent negative sample mining is proposed. The core of this method is to propose a consistent negative sample mining loss to adjust the training attention of samples, so as to improve the optimization process of object detectors. Firstly, according to unsupervised regression performance of negatives, we mine consistent negatives with stable background property from the negative set, and the remaining negatives with foreground property are biased negatives. Then the original classification loss is modified according to background information entropy of negatives. By reducing loss weights of biased negatives, their training attention can be weakened, thus the conflict in the foreground / background feature learning can be reduced. Experimental results on multiple datasets show that the proposed method can effectively improve the learning of foreground and background features as well as improve detection performance.
3. In view of the weak spatial perception of regression-based detectors and the limitation of one-dimensional coordinate supervision in constraining models to learn spatial features, a border-aware and box-evolving based detector is proposed to improve the precision of object localization. The core of the proposed method is to extend the one-dimensional localization supervision to two-dimensional image space, and by constraining the learning of spatial details sensitive to object borders in the image space, the model can explicitly perceive object borders as well as focus on their feature learning. Firstly, an energy functional of bounding box is constructed motivated by Active Contour Models. The energy terms are learned and generated by convolutional neural networks, thus the location supervision is introduced into the two-dimensional image space. Then the learned border-sensitive information is coupled with the regression branch, and the gradient updates of the energy functional guide regressed boxes to evolve towards object borders. Experiments on several public datasets show that the joint learning of spatial information and the regression branch effectively improves the basic localization features extracted by the model, leading to improvement in localization precision.
|Keyword||目标检测 深度卷积神经网络 运动语义 一致性负样本 边界感知与边框进化|
|王晓莲. 基于语义增强与边界感知的交通场景下目标检测方法研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2022.|
|Files in This Item:|
|Recommend this item|
|Export to Endnote|
|Similar articles in Google Scholar|
|Similar articles in Baidu academic|
|Similar articles in Bing Scholar|
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.