CASIA OpenIR  > 毕业生  > 博士学位论文
复杂环境下的图像目标检测与可视化
潘兴甲
Subtype博士
Thesis Advisor徐常胜
2020-12-04
Degree Grantor中国科学院大学
Place of Conferral中国科学院自动化研究所
Degree Name工学博士
Degree Discipline模式识别与智能系统
Keyword目标检测 动态修正 旋转卷积 密集排列 自监督 大尺度目标 图像可视化
Abstract

目标检测是计算机视觉领域内一个长期存在的基础性难题,其研究目的是确定某张给定图像中是否存在给定类别的目标实例;如果存在,就返回每个目标实例的空间位置和覆盖范围。图像可视化是指从图像数据集中挑选感兴趣图像并进行展示的技术,主要包含图像摘要与布局生成两个过程。该技术对于人们快速获取视觉大数据的重要信息具有重要作用。图像目标检测与可视化从感兴趣目标与感兴趣图像两个层面对视觉数据进行解析,对于人们理解、掌握视觉大数据具有重要意义。

本文主要对复杂环境下的图像目标检测与可视化展开研究。
目前的图像目标检测与可视化技术主要存在问题有:(一),受限于显存技术,基于深度学习的目标检测方法的输入图像尺度只能限制在较小范围。当应用场景中图像尺度较大时,现有方法通常采取缩放或者裁减等操作,不可避免降低模型检测性能,同时增加了模型的计算复杂度;(二),真实应用场景中,图像中的目标通常带有一定的旋转角度,且外观变化多样,目前主流方法难以提取准确的目标特征,且无法根据样本的多样性进行动态调整,限制了模型的性能;(三),现有图像可视化技术仅利用手工或者单一场景图像特征,无法对图像进行完备描述,且在布局生成过程难以同时满足形状保持与内容相关等约束,导致图像可视化结果难以令人满意。针对以上问题,本文结合机器学习、计算机视觉等领域的最新进展/成果,提出针对复杂场景的高性能图像目标检测与可视化方法。

本文的主要研究工作与贡献如下:
    1. 提出一种基于自监督特征增强的大尺度图像目标检测方法。
    在基于 CNN 的目标检测器中,模型输入图像尺度极大的影响着性能好坏。在显存约束下,当场景中图像尺度极大时,无法保证模型输入达到最优的尺度,而对图像进行缩放或者裁减操作又不可避免的丢失信息或者降低效率。针对上述问题,本文提出一种基于自监督特征增强网络的大尺度图像目标检测方法,主要包含基于亚像素卷积的残差上采样模块与自监督特征增强模块。通过引入真实高分辨率信息以及低损的上采样过程,实现小尺度输入获取与大尺度输入相媲美的检测性能,同时提高了检测效率。该方法获得了 在ECCV2018 举办的 Mapillary Vistas Object Detection Challenge 比赛冠军,同时,其有效性在 Cityscape 和 MS COCO 数据集合上得到了有效验证。
    2.  提出一种基于自适应特征增强与动态修正的旋转目标检测方法。
    主流目标检测技术虽然在 Pascal VOC, MS COCO 等数据集取得了显著进步。但在某些应用场景中,如目标存在任意角度旋转或者排列紧密时,这些方法都表现较差。此外,目前大部分方法遵从静态的范式,即模型在测试阶段固定了训练过程中学习到的参数,对给定样本进行推断,使得模型不具备根据形态(视角、尺度)多变的样本进行自适应预测的能力。针对上述问题,本文提出了基于动态修正网络的目标检测方法,对密集排列及任意角度旋转的目标进行检测。首先,提出了基于旋转卷积的特征选择模块,通过对卷积核采样位置根据预测角度进行旋转获得角度不敏感的特征,同时设置不同形状的卷积核使得网络可以根据目标形状进行动态调整感受野,获得更加准确的特征表达。其次,分别针对检测器的分类与回归分支设计了动态调整模块,使得分类器与回归器得以捕捉样本敏感的信息,进而得到更加准确的目标分类与大小回归结果。该方法的有效性在 DOTA,HRSC2016 以及 SKU110K-R 等公开数据集合上得到了有效验证。
   3. 提出一种基于目标检测技术的内容可知的图像数据可视化方法。
    图像数据可视化包含图像摘要与布局生成两个过程,但现有大部分工作只专注其中一个方面。图像摘要生成方法大部分基于人为定义特征,导致无法完备地对目标、场景信息进行描述。布局生成的现有方法或者没有考虑图像的形状,或者不能完整保留图像的全部内容。考虑到现有方法的不足,本文提出了一种基于目标内容的自动生成图像视觉摘要可视化的方法。首先提出一种基于潜在主题的图像摘要生成方法,综合考虑图像内容的多样性,简洁性以及美学性。同时, 提出一种基于树的布局生成方法。通过两阶段的树结构构造以及两阶段的树结构优化方法显著提高了布局性能,促进了图像数据可视化技术的发展。

Other Abstract

Object detection is a long-standing basic problem in the field of computer vision. Its goal is to determine whether there are object instances of a given category in a given image; if so, return the spatial location and bounding box of each instance.  Image visualization refers to the technology of selecting and displaying images of interest from an image data set, which mainly includes two processes of image summarization and layout generation. This technology plays an important role for people to quickly capture important information of visual big data. Image object detection and visualization technology analyzes visual data from both objects and images of interest, which is of great significance for people to access and understand visual big data.

This dissertation focuses on the research of object detection and visualization of images in complex environments.  The main problems of current image object detection and visualization technology are: (1) under the memory constraint, the input scale of the object detection methods based on deep learning can only be limited to a small range. When the scale of the image in the application scene is extremely large, the existing methods usually adopt operations such as scaling or cropping, which inevitably reduces the performance and increases the computational complexity of the model; (2) the objects usually are in arbitrary orientations and with large appearance variation in some scenes. Current mainstream methods are difficult to extract accurate object features, and cannot dynamically refine the predictions in accordance with samples, which limits the performance of the models; (3) the existing image collage methods usually use handcrafted or single scene image features, which makes it impossible to fully describe the image. For layout generation, existing methods cannot simultaneously satisfy the constraints of shape retention and content preservation, resulting in unsatisfactory collage results. In response to the above problems, this dissertation combines the latest developments/achievements in machine learning, computer vision, and other fields to propose high-performance image object detection and visualization methods for complex scenes.

The main contributions of this dissertation are summarized as follows:
    1. This dissertation proposes a self-supervised feature augmentation network for large image object detection. Input scale plays an important role in modern detection frameworks, and an optimal training scale for images exists empirically. However, the optimal one usually cannot be reached in facing extremely large images under the memory constraint.  To solve the above issue, this paper proposes a large-scale image object detection method based on a self-supervised feature augmentation network, which mainly includes a residualized up-sampling block based on sub-pixel convolution and a guided feature augmentation module. By introducing real high-resolution information and the effective up-sampling method, the network with small-scale input can obtain comparable performance to that of large-scale input, and at the same time improve the efficiency of inference.  This method won the Mapillary Vistas Object Detection Challenge competition held at ECCV2018, and extensive experiments on the Cityscape and MS COCO datasets demonstrate the effectiveness.
  2. This dissertation proposes a dynamic refinement network for oriented and densely packed object detection.  Although existing object detection technologies have made significant progress in datasets such as Pascal VOC, MSCOCO, etc., these methods suffer severe problems when object instances are in arbitrary angles and densely packed. In addition, most of the current methods follow the static paradigm, that is, detectors optimize model parameters on the training set and keep them fixed afterward so that the model cannot change over samples to make a flexible prediction.  In response to the above problems, this paper proposes a dynamic refinement network. First, a feature selection module based on rotating convolution is proposed. By rotating the sampling position of the convolution kernel according to the predicted angle, the model can obtain the angle-invariant features. At the same time, the module contains multiple branches. By setting different shapes of kernels for convolutions, the network can dynamically adjust the receptive field according to the object shape and then get a more accurate feature expression. Secondly, two slightly different dynamic refinement heads are designed for classification and regression, respectively, to models the uniqueness of each example explicitly. Thus, it can dynamically refine the prediction in an object-aware manner and then obtain more accurate object classification and regression results. This method has been validated on DOTA, HRSC2016, and SKU110K-R datasets.
    3.This dissertation proposes a content-based visual summarization method for image collections with the help of object detection. Visual summarization includes two processes, namely, image summarization and layout generation, but most of the existing work only focuses on one of them. Image summarization methods are mostly based on human-defined features, which leads to incomplete descriptions of the object and scene information. The existing methods of layout generation either do not consider the shape of the image or cannot completely preserve the entire content of the image. This paper proposes a method to automatically generating visual summarization based on image content.  Specifically, by proposing a hidden topic-based diversity analysis method, the image summarization can be generated by comprehensively considering the diversity, conciseness, and aesthetics of image content.  At the same time, by designing a tree-based layout generation method, a satisfactory collage can be generated through two-stage tree structure construction and a two-stage tree structure optimization process. 

Pages168
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/41612
Collection毕业生_博士学位论文
Recommended Citation
GB/T 7714
潘兴甲. 复杂环境下的图像目标检测与可视化[D]. 中国科学院自动化研究所. 中国科学院大学,2020.
Files in This Item:
File Name/Size DocType Version Access License
复杂环境下的图像目标检测与可视化-最终提(62004KB)学位论文 限制开放CC BY-NC-SAView
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[潘兴甲]'s Articles
Baidu academic
Similar articles in Baidu academic
[潘兴甲]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[潘兴甲]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: 复杂环境下的图像目标检测与可视化-最终提交版.pdf
Format: Adobe PDF
This file does not support browsing at this time
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.