CASIA OpenIR  > 毕业生  > 博士学位论文
基于深度强化学习的主动目标检测方法研究
许诺
2022-11-29
页数122
学位类型博士
中文摘要

主动目标检测旨在利用成像控制、图像处理等技术,模拟人类视觉系统加工信息的形式,通过连续的多步决策方式来处理图像序列,智能化获取关键情报,从而更好地服务于传统被动目标检测。该任务在对地观测系统、人机交互、机器人等众多领域都有着重要的研究意义与应用价值。目前,传统的目标检测方法由于存在被动性而受到一定的限制:(1)无法自主选择最优参数配置;(2)很难适应变化的环境;(3)不能及时利用评价反馈进行反思与规划。因此,针对被动目标检测存在的问题与不足,本文首先给出了主动性的定义并提出了主动目标检测框架:主动性指的是通过序列决策方法自适应地调整视觉系统中的某一项或几项参数配置,包括图像属性配置、包围框位置与形状、相机位置与姿态、网络超参数配置等,以提高视觉任务的最终性能。主动目标检测框架由三部分模块组成,包括成像模块、相机控制模块和目标检测模块,能够同时完成视觉感知与决策认知的任务。本系列研究工作将基于深度强化学习对主动性进行建模,针对主动目标检测框架提出四种主动配置学习策略。本文的主要贡献归纳如下:

1、提出一种目标检测模块中的包围框学习策略。为了解决早期包围框学习方法中初始与最终状态不够精确的问题,智能体执行一种由粗到精的动态注视算法用于目标检测,分“瞄准”和“击中”两步。其中,“瞄准”表示最初的一瞥,可以定位所有物体的中心点,给出初始框的大致位置;“击中”指仔细观察,能使用序列决策方法动态调整初始框以获得紧凑包围框,并且预测了角点用作最终微调。与已有包围框决策方法的区别在于,本方法以关键点作为载体,实现由中心到角点的检测方式,使得本方法既具有关键点检测器的识别精度,又具有由粗到精的类人视觉模式。在多个公开数据集中证明了算法的价值。
2、提出一种训练过程中的超参配置学习策略。智能体能够自主学习并选择学习速率等超参数用于超参数优化,最大化挖掘神经网络的潜能。神经网络的训练本质上受敏感超参数和不及时的性能评价反馈的影响。为了解决上述两个问题,借助注意力和记忆力机制,本方法能学习短期和长期超参数配置关系以实现配置采样,并能够在巨大的搜索空间中有效定位所有类型的高性能超参数配置。此外,使用了自助法用于解决评价指标获取不及时导致的训练样本数量不足的问题,提高了在线学习效率。在多个公开数据集上验证了本方法的可行性。
3、提出一种成像模块中的图像属性配置学习策略。为了解决传统检测任务中成像与检测两者割裂的问题,智能体实施一种自适应图像亮度、目标尺度联合学习算法用于图像预处理,积极改善图像质量,能够在不重新训练检测器的前提下提高低质量图像的检测效果。本方法能够根据检测器的反馈动态优化成像条件,提高成像的质量,提升目标检测器面向图像亮度变换和目标尺度变换的自适应能力。与已有图像增强方法的最主要区别在于,本方法是以目标检测性能为导向的,即图像增强是为检测服务的,而不是基于人类视觉检查来评估。在众多实验中验证了所提方法的有效性。
4、提出一种相机控制模块中的相机参数配置学习策略。为了进一步解决传统检测任务中相机控制、成像和检测三者割裂的问题,智能体根据相机所成图像的检测效果反馈调节相机参数配置用于相机控制。其中,相机能通过改变视角、高度和曝光补偿来找到优化检测性能的相机参数配置。与只能处理静态图像的传统目标检测相比,这种动态处理图像的方式消除了视觉感知与决策认知之间的鸿沟。在真实和虚拟环境中分别建立了数据集和相应环境,即小型机场和虚拟公园。在小型机场上,获得了不同视角和空间分辨率的多种飞机图像。在虚拟公园中,可以通过控制虚拟相机调整相机参数配置,从中识别各种汽车。在上述两个数据集以及一个公开多视角数据集上证明了所提算法的优势。

英文摘要

Active Object Detection (AOD) aims to apply imaging control, image processing and other technologies to simulate human vision, and process image sequences through multi-step decision-making to obtain key information, thus better serving traditional passive object detection. This task has both theoretical significance and application value in many fields such as earth observation, human-computer interaction, and robotics. Traditional detectors are subject to three restrictions due to their passivity: (1) Inability to select the optimal configuration independently; (2) Difficulty adapting to changing environment; (3) Inability to use evaluation feedback in time for reflection and planning. Therefore, in view of the shortcomings of passive object detection, the concept of activeness and the framework of AOD are defined firstly: activeness refers to the adaptive adjustment of the parameter configurations in the visual system, e.g., image attribute, bounding box position and shape, camera position and posture, network hyper-parameter, through a serialized decision-making to improve the final performance. The framework of AOD includes imaging module, camera control module and object detection module, and completes the tasks of visual perception and decision cognition. Our research models the activeness based on deep reinforcement learning and proposes four active configuration learning strategies for the framework of AOD. The contributions are summarized as follows:

1. A bounding box learning strategy in object detection module is proposed. In order to solve the problem of inaccurate initial and final states in early bounding box learning methods, our agent applies a dynamic coarse-to-fine gaze for object detection, which is divided into two steps, AIM and HIT. AIM means first glance, which locates the center points of all objects and the approximate positions of the initial boxes; HIT means careful observation, which dynamically adjusts the initial boxes to obtain compact bounding boxes by sequence decision, and predicts the corner points for refinement. The difference from the existing decision-making detectors is that our method introduces the key points as the carrier to realize the detection from centers to corners, thus achieving a human-like high-performance visual mode. The value of our method is proven on public datasets.
2. A hyper-parameter configuration learning strategy in the training process is proposed. Our agent autonomously selects hyper-parameters such as learning rate for hyper-parameter optimization to maximize the potential of neural networks. The training of networks is subject to sensitive hyper-parameters and untimely evaluation feedback. Benefiting from the attention and memory mechanisms, our method learns both short-term and long-term relationships to find high-performance configurations in the huge search space effectively in order to solve the above two problems. Furthermore, bootstrap is adopted to solve the lack of training samples caused by untimely evaluation feedback, improving the efficiency of online training. The practicality of our method is verified on several public datasets.
3. An image configuration learning strategy in imaging module is proposed. In order to solve the problem of separation of imaging and detection in traditional detection tasks, our agent implements joint adaptive learning of image brightness and object scale for image preprocessing, actively improving image quality and the detection effect of low-quality images without the need to retrain the detector. Our method dynamically optimizes imaging conditions based on the feedback from the detector, improving the quality of imaging and the adaptive ability of the detector for brightness and scale transformation. The main difference from existing image enhancement methods is that our method is oriented towards detection performance, i.e., image enhancement is for detection, rather than human visual inspection. Experiments verify the effectiveness of our method.
4. A camera configuration learning strategy in camera control module is proposed. In order to further solve the problem of separation of camera control, imaging and detection in traditional detection tasks, our agent adjusts the camera configuration for camera control according to the feedback from the detector. In our work, the camera is allowed to change the angle of view, height, and exposure compensation to achieve higher performance. Compared with traditional detectors, our method processes images dynamically rather than statically, eliminating the gap between perception and cognition. Two environments, a small airport and a virtual park, are established. In the former, multi-view multi-scale aircraft images can be obtained. In the latter, the virtual camera can be controlled to image various cars. The advantages of our method are proved on the above two datasets and a public dataset.

关键词目标检测 深度强化学习 主动目标检测 深度学习
语种中文
七大方向——子方向分类目标检测、跟踪与识别
国重实验室规划方向分类视觉信息处理
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/50606
专题毕业生_博士学位论文
推荐引用方式
GB/T 7714
许诺. 基于深度强化学习的主动目标检测方法研究[D],2022.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
毕业论文-许诺.pdf(9588KB)学位论文 限制开放CC BY-NC-SA
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[许诺]的文章
百度学术
百度学术中相似的文章
[许诺]的文章
必应学术
必应学术中相似的文章
[许诺]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。