表示增强的深度强化学习算法研究 | |
张清扬![]() | |
2024-05-15 | |
页数 | 158 |
学位类型 | 博士 |
中文摘要 | 得益于深度神经网络强大的近似能力,深度强化学习能够在端到端的训练过程中同时学习表示和策略。通过在环境中试错,深度强化学习智能体接收环境的奖励信号,并调整表示与策略,以最大化累积回报。然而,奖励驱动的表示学习存在收敛速度慢和过度拟合奖励信号的问题。相比之下,人类对物理世界的理解并非完全由奖励驱动,而是更为复杂和多样化,具有层次化结构。具体而言,人类通常会将复杂的任务分解为多个子任务,并在不同的任务层级上学习表示与进行决策。此外,人类群体在解决协作任务时会形成明确的分工和协作关系,以将复杂的群体决策分解为更简单、更易于管理的小组决策。 深度强化学习通常采用向量来表示状态、动作和其他相关信息。然而,相比于向量形式的表示,自然语言作为一种更为先进和接近人类思维方式的表示形式,能够直观地传达更加丰富的信息。近年来,大型语言模型展现出了强大的语义理解和生成能力。将大型语言模型引入深度强化学习中以生成自然语言形式的表示,有望带来更灵活、更丰富多样,以及更具可解释性的表示与决策。这种结合为深度强化学习算法的发展提供了新的思路和方法,有望推动深度强化学习算法在各种任务和场景中的应用,并取得更好的性能。 本文借鉴人类的层次化认知与决策机制,对多种深度强化学习方法进行了表示增强的研究,旨在解决这些方法面临的长时程稀疏奖励、部分可观测性等挑战,并提升它们在平衡探索与利用、促进多智能体协作、提升可扩展性和可解释性等性能方面的表现。通过表示学习方法和大型语言模型,本文对深度强化学习方法中的子任务、共识、编组等关键要素实现了向量和自然语言两种形式的表示增强。 1. 子任务表示增强的分层深度强化学习方法。 2. 共识表示增强的多智能体深度强化学习方法。该研究面向多智能体决策场景,对群体进行了层次化组织。该研究将多智能体系统划分为群体-个体两层级结构,并提出了双流共识的概念,包括智能体内和智能体间共识。在个体层级,该研究学习具有时序抽象性和时序一致性的隐表示,作为单个智能体对任务的宏观理解(智能体内共识)。在群体层级,该研究通过表示学习,将各个智能体对任务的宏观理解进行对齐(智能体间共识)。该研究定义同时具有时序抽象性和时序一致性,且满足对齐关系的隐表示为双流共识的表示,并使用双流共识的表示指导智能体的分布式决策过程。双流共识的引入克服了多智能体系统的部分可观测性挑战,并显著提高了多智能体的协作能力。该研究提出的方法能够灵活地与多种多智能体深度强化学习算法相结合,并提高它们解决部分可观测性和促进多智能体协作的能力。与现有方法相比,该研究能够在分布式执行过程中通过隐式推理双流共识的表示,实现信息的增强,而无需智能体间通信或对其他智能体进行建模。 3. 编组表示增强的分层多智能体深度强化学习方法。该研究面向多智能体决策场景,对任务和群体分别进行了层次化分解和组织。该研究将多智能体系统划分为群体-小组-个体三层级结构,并通过增强编组的表示,显著提高了多智能体深度强化学习方法克服部分可观测性、促进多智能体协作,以及提升可扩展性和可解释性的能力。该研究共包含了两项工作,它们分别采用了不同形式的表示增强了编组表示。 |
英文摘要 | Thanks to the powerful approximation ability of deep neural networks, deep reinforcement learning can simultaneously learn representations and policies in an end-to-end training process. Through trial and error in the environment, deep reinforcement learning agents receive reward signals from the environment and adjust their representations and policies to maximize cumulative rewards. However, reward-driven representation learning suffers from slow convergence and overfitting to reward signals. In contrast, human understanding of the physical world is not entirely driven by rewards but is more complex and diverse, with a hierarchical structure. Specifically, humans often decompose complex tasks into multiple subtasks, learn representations, and make decisions at different task levels. Additionally, human groups form explicit divisions of labor and collaboration when solving cooperative tasks, decomposing complex group decisions into simpler, more manageable group decisions. This hierarchical cognitive and decision-making mechanism helps humans gradually establish more abstract and high-level representations, thereby more effectively understanding and dealing with the complexity of the physical world. Human hierarchical cognitive and decision-making mechanisms provide valuable insights for designing more flexible and intelligent deep reinforcement learning methods. Deep reinforcement learning typically employs vectors to represent states, actions, and other relevant information. However, compared to vector-based representations, natural language, as a more advanced and closer representation of human thought, can intuitively convey richer information. In recent years, large language models have demonstrated powerful semantic understanding and generation capabilities. Introducing large language models into deep reinforcement learning to generate natural language representations holds the promise of providing more flexible, diverse, and explainable representations and decisions. This combination provides novel insights for the development of deep reinforcement learning methods, which are expected to drive the application of deep reinforcement learning methods in various tasks and scenarios and achieve better performance. This paper draws inspiration from the hierarchical cognitive and decision-making mechanisms of humans to conduct representation enhancement research on various deep reinforcement learning frameworks. The aim is to address challenges faced by these frameworks, such as long-term sparse rewards, and partial observability, and to improve their performance in exploration-exploitation balance, multi-agent collaboration, scalability, and explainability. 1. Subtask representation enhancement for hierarchical deep reinforcement learning method. This research focuses on the hierarchical decomposition of tasks in single-agent decision-making scenarios. The single-agent system is considered as having only an individual-level single-layered structure, and the state transitions at both abstract and atomic time scales are comprehensively considered. Through representation learning methods, the research learns subtask representations with temporal abstraction and temporal consistency. Based on this, a hidden-space landmark graph is constructed to represent tasks, and the problem of policy solving in hierarchical deep reinforcement learning is theoretically transformed into a path planning problem in the hidden-space landmark graph. The nodes of the hidden-space landmark graph represent the latent representations of learned subtasks, while the edges of the graph correspond to the transitions of subtasks. In addition, this research designs a subtask selection strategy based on the hidden-space landmark graph, achieving a better balance between exploration and exploitation. Compared to existing methods, this research learns subtask representations with temporal abstraction and temporal consistency, and significantly improves sample efficiency and asymptotic performance in decision-making tasks with long-term sparse rewards by establishing the hidden-space landmark graph. Two algorithm variants are implemented in this research, each employing different strategies to construct the hidden-space landmark graph, demonstrating different advantages in computational efficiency and performance. 2. Consensus representation enhancement for multi-agent deep reinforcement learning method. This research targets scenarios involving multi-agent decision-making and organizes groups hierarchically. The research divides the multi-agent system into two levels: group-level and individual-level, proposing the concept of dual-channel consensus, which includes consensus within agents and between agents. At the individual level, the research learns hidden representations with temporal abstraction and temporal consistency as a macro understanding of tasks for individual agents (intra-agent consensus). At the group level, the research aligns the macro understanding of tasks for individual agents through representation learning (inter-agent consensus). The research defines hidden representations that simultaneously possess temporal abstraction and temporal consistency, and satisfy alignment relationships as representations of dual-channel consensus, guiding the distributed decision-making process of agents. The introduction of dual-channel consensus overcomes the partial observability challenges of multi-agent systems and significantly enhances the collaborative capabilities of multiple agents. The proposed method can flexibly integrate with various multi-agent deep reinforcement learning methods, enhancing their ability to address partial observability and promote multi-agent collaboration. Compared to existing methods, this research achieves information enhancement through implicit reasoning of dual-channel consensus representations during distributed execution processes, without requiring inter-agent communication or modeling of other agents. 3. Group representation enhancement for hierarchical multi-agent deep reinforcement learning. This research targets scenarios involving multi-agent decision-making, hierarchically decomposing tasks, and organizing groups separately. The research divides the multi-agent system into a three-level structure: group, sub-group, and individual, significantly enhancing multi-agent deep reinforcement learning methods by enhancing group representations to overcome partial observability, promote multi-agent collaboration, and improve scalability and explainability. The research comprises two works, each employing different forms of representation to enhance group representation. |
关键词 | 请输入关键词深度强化学习,表示学习,分层强化学习,多智能体强化学习,大型语言模型 |
学科领域 | 人工智能 |
学科门类 | 工学::控制科学与工程 |
收录类别 | 其他 |
语种 | 中文 |
是否为代表性论文 | 是 |
文献类型 | 学位论文 |
条目标识符 | http://ir.ia.ac.cn/handle/173211/57198 |
专题 | 毕业生_博士学位论文 |
推荐引用方式 GB/T 7714 | 张清扬. 表示增强的深度强化学习算法研究[D],2024. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | ||
表示增强的深度强化学习算法研究.pdf(37765KB) | 学位论文 | 限制开放 | CC BY-NC-SA |
个性服务 |
推荐该条目 |
保存到收藏夹 |
查看访问统计 |
导出为Endnote文件 |
谷歌学术 |
谷歌学术中相似的文章 |
[张清扬]的文章 |
百度学术 |
百度学术中相似的文章 |
[张清扬]的文章 |
必应学术 |
必应学术中相似的文章 |
[张清扬]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论