CASIA OpenIR  > 毕业生  > 硕士学位论文
面向高仿真场景的深度强化学习算法研究
钮龙宇
2024-05
Pages72
Subtype硕士
Abstract

深度强化学习在各种模拟环境中取得了重大进展。然而,由于现有环境中视觉保真度、场景复杂度和任务多样性的限制,以及现有算法在面对高仿真场景时存在的样本效率低、自主探索度不足、延时高等多种不足,将深度强化学习方法应用于现实世界场景仍面临大量挑战。为了解决这些问题,本文基于虚幻引擎针对性地搭建了高仿真3D开放世界第一人称射击游戏(FPS)平台虚幻战场(UBG),并基于该高仿真场景,从分层学习、模仿学习和表征学习等多种领域展开探索研究,主要聚焦高仿真场景下智能体复杂动作控制难题以及虚实结合时面临的现实鸿沟,提出了一系列深度强化学习网络框架,并在多个应用场景中进行了充分的实验验证。本文的主要贡献如下:

1.搭建高仿真FPS场景并提出了一种基于计算机视觉感知的分层模仿学习算法。

本文推出了一个逼真的3D FPS仿真平台——虚幻战场,其具有可自定义的复杂度,随机的场景,多样化的任务,以及多种场景交互方式,包含远比经典伪3D FPS平台(如ViZDoom)复杂的状态-动作空间。
并且基于该平台,针对现有算法在高仿真场景中面对的样本效率低、奖励稀疏、自主探索度不足等缺陷,
本文还提出了一种采用两级分层的强化学习架构,其中上级控制者学习做出“选项”,而低级的工人负责掌握分配的子任务;引入计算机视觉感知模块获取环境组件深度检测信息来增强输入,提高样本效率及智能体水平;以基于势函数的内在奖励塑造方法将任务分解并丰富奖励形式;使用基于权重退火的关键帧示范模仿学习在提高早期收敛效率的同时不降低算法探索度。

2.提出了一种基于规则型动作掩码的分层表征学习算法。

该方法针对高仿真场景异步模式下算法延时性问题,利用表征学习方法取代目标检测器以及深度预测器,明显改善网络控制的实时性;利用半正交动作空间分解,将平台原始动作空间重构为多组半正交的动作子空间,并通过规则型的动作掩码屏蔽非正交的动作组合,在不减少可行动作组合的同时降低原始动作空间复杂度;此外动作掩码还被用于根据最普适的基础规则直接操控智能体决策输出,保证智能体更加安全有效的探索。

本文还选择了经典的FPS平台ViZDoom测试所提出方法的泛化能力,大量实验表明,本文所提出的深度强化学习网络框架在多方面取得了明显的提升,并具有较强的泛化能力。综上所述,本文为高仿真场景下深度强化学习算法的研究提供了新的思路和方法,有着广泛而重要的研究及实用价值,对于推动该领域的发展具有深远的意义。

Other Abstract

Deep reinforcement learning (DRL) has made significant progress in various simulated environments. However, applying DRL methods to real-world scenarios still faces numerous challenges due to limitations in visual fidelity, scene complexity, task diversity, and the shortcomings of existing algorithms when dealing with high-fidelity simulations, such as low sample efficiency, insufficient exploration, and high latency. To address these issues, this paper specifically constructed a high-fidelity 3D open-world first-person shooter (FPS) platform named Unreal BattleGrounds (UBG) based on the Unreal Engine, and explores research in various areas such as hierarchical learning, imitation learning, and representation learning based on this high-fidelity environment. The focus is mainly on the challenges of complex action control of agents in high-fidelity environments and the reality gap faced when combining virtual and real elements. A series of deep reinforcement learning network frameworks are proposed and extensively validated in multiple application scenarios. The main contributions of this paper are as follows:

1. Building a high-fidelity FPS benchmark and proposing a hierarchical imitation learning algorithm based on computer vision perception.

This paper introduces a realistic 3D FPS simulation platform, Unreal BattleGrounds, with customizable complexity, random scenes, diverse tasks, and various scene interaction modes, including a state-action space far more complex than classic pseudo-3D FPS platforms (such as ViZDoom). Based on this platform, addressing the shortcomings of existing algorithms in high-fidelity environments such as low sample efficiency, sparse rewards, and insufficient autonomous exploration, this paper also proposes a two-level hierarchical RL architecture. The upper-level controller learns to make "options," while the lower-level workers are responsible for mastering the assigned subtasks. A computer vision perception module is introduced to obtain depth detection information of environment components to enhance inputs, improve sample efficiency, and agent performance. Task decomposition and reward enrichment are achieved through an intrinsic reward shaping method based on potential functions. Keyframe imitation learning based on weight annealing is used to improve early convergence efficiency without reducing algorithm exploration.

2. Proposal of a hierarchical representation learning algorithm based on rule-based action masking.

This method addresses the latency issue of algorithms in asynchronous mode in high-fidelity scenes by utilizing representation learning methods to replace on-the-fly object detection and depth predictors, significantly improving the real-time control of networks. By decomposing the original action space into multiple sets of semi-orthogonal action subspaces and masking non-orthogonal action combinations with rule-based action masks, the complexity of the original action space is reduced without decreasing feasible action combinations. Additionally, action masks are used to directly control agent decision outputs based on the most universal basic rules, ensuring safer and more effective agent exploration. 

The paper also tested the generalization ability of the proposed methods using the classic FPS platform ViZDoom. Numerous experiments demonstrate that the deep reinforcement learning network frameworks proposed in this paper have shown significant improvements in various aspects and possess strong generalization capabilities. In conclusion, this paper provides new ideas and methods for the research of DRL algorithms in high-fidelity environments, with broad and significant research and practical value, and holds profound implications for advancing the field.

Keyword深度强化学习 高仿真场景 分层学习 模仿学习 表征学习
Subject Area人工智能 ; 模式识别 ; 计算机感知
Language中文
Sub direction classification强化与进化学习
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/56613
Collection毕业生_硕士学位论文
多模态人工智能系统全国重点实验室
Recommended Citation
GB/T 7714
钮龙宇. 面向高仿真场景的深度强化学习算法研究[D],2024.
Files in This Item:
File Name/Size DocType Version Access License
钮龙宇_中国科学院大学学位论文.pdf(9528KB)学位论文 限制开放CC BY-NC-SA
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[钮龙宇]'s Articles
Baidu academic
Similar articles in Baidu academic
[钮龙宇]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[钮龙宇]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.