表示增强的深度强化学习算法研究

CASIA OpenIR > 毕业生 > 博士学位论文

	表示增强的深度强化学习算法研究
	张清扬
	2024-05-15
页数	158
学位类型	博士
中文摘要	得益于深度神经网络强大的近似能力，深度强化学习能够在端到端的训练过程中同时学习表示和策略。通过在环境中试错，深度强化学习智能体接收环境的奖励信号，并调整表示与策略，以最大化累积回报。然而，奖励驱动的表示学习存在收敛速度慢和过度拟合奖励信号的问题。相比之下，人类对物理世界的理解并非完全由奖励驱动，而是更为复杂和多样化，具有层次化结构。具体而言，人类通常会将复杂的任务分解为多个子任务，并在不同的任务层级上学习表示与进行决策。此外，人类群体在解决协作任务时会形成明确的分工和协作关系，以将复杂的群体决策分解为更简单、更易于管理的小组决策。这种层次化的认知与决策机制帮助人们逐步建立更抽象和高级的表示，从而更有效地理解和应对物理世界的复杂性。人类的层次化认知与决策机制为设计更灵活、更智能的深度强化学习算法提供了宝贵的思路。深度强化学习通常采用向量来表示状态、动作和其他相关信息。然而，相比于向量形式的表示，自然语言作为一种更为先进和接近人类思维方式的表示形式，能够直观地传达更加丰富的信息。近年来，大型语言模型展现出了强大的语义理解和生成能力。将大型语言模型引入深度强化学习中以生成自然语言形式的表示，有望带来更灵活、更丰富多样，以及更具可解释性的表示与决策。这种结合为深度强化学习算法的发展提供了新的思路和方法，有望推动深度强化学习算法在各种任务和场景中的应用，并取得更好的性能。本文借鉴人类的层次化认知与决策机制，对多种深度强化学习方法进行了表示增强的研究，旨在解决这些方法面临的长时程稀疏奖励、部分可观测性等挑战，并提升它们在平衡探索与利用、促进多智能体协作、提升可扩展性和可解释性等性能方面的表现。通过表示学习方法和大型语言模型，本文对深度强化学习方法中的子任务、共识、编组等关键要素实现了向量和自然语言两种形式的表示增强。本文的研究内容和主要创新点涵盖以下三个方面： 1. 子任务表示增强的分层深度强化学习方法。该研究面向单智能体决策场景，对任务进行了层次化分解。该研究将单智能体系统视为仅具有个体层级的单层级结构，并综合考虑了抽象时间尺度和原子时间尺度的状态转移情况，通过表示学习方法，学习了具有时序抽象性和时序一致性的子任务表示。在此基础上，该研究构建了隐空间路标图以对任务进行表示，并通过理论推导将分层深度强化学习的策略求解问题转化为隐空间路标图的路径规划问题。隐空间路标图的节点是学习到的子任务的隐变量表示，而图的边对应子任务的转移。此外，该研究基于隐空间路标图设计了一种子任务选择策略，实现了更好的探索和利用的平衡。与现有方法相比，该研究学习了具有时序抽象性和时序一致性的子任务表示，并通过建立隐空间路标图，在具有长时程稀疏奖励的决策任务中显著提高了样本效率和渐进性能。该研究共实现了两种算法变体，它们采用了不同的策略构建隐空间路标图，并展现了在计算效率和性能方面的不同优势。 2. 共识表示增强的多智能体深度强化学习方法。该研究面向多智能体决策场景，对群体进行了层次化组织。该研究将多智能体系统划分为群体-个体两层级结构，并提出了双流共识的概念，包括智能体内和智能体间共识。在个体层级，该研究学习具有时序抽象性和时序一致性的隐表示，作为单个智能体对任务的宏观理解（智能体内共识）。在群体层级，该研究通过表示学习，将各个智能体对任务的宏观理解进行对齐（智能体间共识）。该研究定义同时具有时序抽象性和时序一致性，且满足对齐关系的隐表示为双流共识的表示，并使用双流共识的表示指导智能体的分布式决策过程。双流共识的引入克服了多智能体系统的部分可观测性挑战，并显著提高了多智能体的协作能力。该研究提出的方法能够灵活地与多种多智能体深度强化学习算法相结合，并提高它们解决部分可观测性和促进多智能体协作的能力。与现有方法相比，该研究能够在分布式执行过程中通过隐式推理双流共识的表示，实现信息的增强，而无需智能体间通信或对其他智能体进行建模。 3. 编组表示增强的分层多智能体深度强化学习方法。该研究面向多智能体决策场景，对任务和群体分别进行了层次化分解和组织。该研究将多智能体系统划分为群体-小组-个体三层级结构，并通过增强编组的表示，显著提高了多智能体深度强化学习方法克服部分可观测性、促进多智能体协作，以及提升可扩展性和可解释性的能力。该研究共包含了两项工作，它们分别采用了不同形式的表示增强了编组表示。（1）第一项工作实现了向量形式的表示增强，其中编组的表示是通过表示学习来隐式地形成和表达的。该工作使用注意力模型作为编组规划器，根据环境状态进行自适应编组（群体层级）。通过学习个体子任务表示（个体层级）和建立组内共识（小组层级），该工作得到了组任务的表示。此外，该工作通过训练组任务表示在表示空间的分布关系，进一步学习了组标识的表示（小组层级）。组任务和组标识的表示结合形成了编组的表示。（2）第二项工作实现了自然语言形式的表示增强，其中编组的表示是由大型语言模型生成的。该工作使用大型语言模型作为编组规划器，利用它们的语言生成能力得到了自然语言表述的组任务和组标识。此外，该工作引入了大型语言模型实现的反思器。该反思器通过对历史轨迹进行反思获得经验，并基于这些经验优化编组规划器的策略。与现有方法相比，第一项工作通过增强向量形式的编组表示，显著提高了算法对具有动态团队组成的多智能体系统的可扩展性；第二项工作通过使用大型语言模型生成自然语言表述的编组，显著提高了决策过程的可解释性。
英文摘要	Thanks to the powerful approximation ability of deep neural networks, deep reinforcement learning can simultaneously learn representations and policies in an end-to-end training process. Through trial and error in the environment, deep reinforcement learning agents receive reward signals from the environment and adjust their representations and policies to maximize cumulative rewards. However, reward-driven representation learning suffers from slow convergence and overfitting to reward signals. In contrast, human understanding of the physical world is not entirely driven by rewards but is more complex and diverse, with a hierarchical structure. Specifically, humans often decompose complex tasks into multiple subtasks, learn representations, and make decisions at different task levels. Additionally, human groups form explicit divisions of labor and collaboration when solving cooperative tasks, decomposing complex group decisions into simpler, more manageable group decisions. This hierarchical cognitive and decision-making mechanism helps humans gradually establish more abstract and high-level representations, thereby more effectively understanding and dealing with the complexity of the physical world. Human hierarchical cognitive and decision-making mechanisms provide valuable insights for designing more flexible and intelligent deep reinforcement learning methods. Deep reinforcement learning typically employs vectors to represent states, actions, and other relevant information. However, compared to vector-based representations, natural language, as a more advanced and closer representation of human thought, can intuitively convey richer information. In recent years, large language models have demonstrated powerful semantic understanding and generation capabilities. Introducing large language models into deep reinforcement learning to generate natural language representations holds the promise of providing more flexible, diverse, and explainable representations and decisions. This combination provides novel insights for the development of deep reinforcement learning methods, which are expected to drive the application of deep reinforcement learning methods in various tasks and scenarios and achieve better performance. This paper draws inspiration from the hierarchical cognitive and decision-making mechanisms of humans to conduct representation enhancement research on various deep reinforcement learning frameworks. The aim is to address challenges faced by these frameworks, such as long-term sparse rewards, and partial observability, and to improve their performance in exploration-exploitation balance, multi-agent collaboration, scalability, and explainability. This paper enhances the representation of key elements such as subtasks, consensus, and grouping in deep reinforcement learning methods through representation learning techniques and large-scale language models, realizing both vector and natural language forms of representation enhancement. The research content and main innovations of this paper cover three aspects: 1. Subtask representation enhancement for hierarchical deep reinforcement learning method. This research focuses on the hierarchical decomposition of tasks in single-agent decision-making scenarios. The single-agent system is considered as having only an individual-level single-layered structure, and the state transitions at both abstract and atomic time scales are comprehensively considered. Through representation learning methods, the research learns subtask representations with temporal abstraction and temporal consistency. Based on this, a hidden-space landmark graph is constructed to represent tasks, and the problem of policy solving in hierarchical deep reinforcement learning is theoretically transformed into a path planning problem in the hidden-space landmark graph. The nodes of the hidden-space landmark graph represent the latent representations of learned subtasks, while the edges of the graph correspond to the transitions of subtasks. In addition, this research designs a subtask selection strategy based on the hidden-space landmark graph, achieving a better balance between exploration and exploitation. Compared to existing methods, this research learns subtask representations with temporal abstraction and temporal consistency, and significantly improves sample efficiency and asymptotic performance in decision-making tasks with long-term sparse rewards by establishing the hidden-space landmark graph. Two algorithm variants are implemented in this research, each employing different strategies to construct the hidden-space landmark graph, demonstrating different advantages in computational efficiency and performance. 2. Consensus representation enhancement for multi-agent deep reinforcement learning method. This research targets scenarios involving multi-agent decision-making and organizes groups hierarchically. The research divides the multi-agent system into two levels: group-level and individual-level, proposing the concept of dual-channel consensus, which includes consensus within agents and between agents. At the individual level, the research learns hidden representations with temporal abstraction and temporal consistency as a macro understanding of tasks for individual agents (intra-agent consensus). At the group level, the research aligns the macro understanding of tasks for individual agents through representation learning (inter-agent consensus). The research defines hidden representations that simultaneously possess temporal abstraction and temporal consistency, and satisfy alignment relationships as representations of dual-channel consensus, guiding the distributed decision-making process of agents. The introduction of dual-channel consensus overcomes the partial observability challenges of multi-agent systems and significantly enhances the collaborative capabilities of multiple agents. The proposed method can flexibly integrate with various multi-agent deep reinforcement learning methods, enhancing their ability to address partial observability and promote multi-agent collaboration. Compared to existing methods, this research achieves information enhancement through implicit reasoning of dual-channel consensus representations during distributed execution processes, without requiring inter-agent communication or modeling of other agents. 3. Group representation enhancement for hierarchical multi-agent deep reinforcement learning. This research targets scenarios involving multi-agent decision-making, hierarchically decomposing tasks, and organizing groups separately. The research divides the multi-agent system into a three-level structure: group, sub-group, and individual, significantly enhancing multi-agent deep reinforcement learning methods by enhancing group representations to overcome partial observability, promote multi-agent collaboration, and improve scalability and explainability. The research comprises two works, each employing different forms of representation to enhance group representation. (1) The first work implements vector-based representation enhancement, where the group's representation is implicitly formed and expressed through representation learning. This work utilizes attention models as group planners, adaptively organizing groups based on environmental states (group level). By learning individual sub-task representations (individual level) and establishing intra-group consensus (sub-group level), this work obtains representations of group tasks. Additionally, by training the distributional relationship of group task representations in the representation space, it further learns representations of group identifiers (sub-group level). The combination of group task and group identifier representations forms the group's representation. (2) The second work implements natural language-based representation enhancement, where the group's representation is generated by large language models. This work utilizes large language models as group planners, leveraging their language generation capabilities to obtain natural language descriptions of group tasks and identifiers. Additionally, this work introduces a reflector implemented by large language models. This reflector gains experience by reflecting on historical trajectories and optimizes the strategy of the group planner based on this experience. Compared to existing methods, the first work significantly improves the scalability of algorithms for multi-agent systems with dynamically changing team compositions by enhancing vector-based group representations. The second work significantly enhances the explainability of the decision-making process by using large language models to generate natural language descriptions of groups.
关键词	请输入关键词深度强化学习，表示学习，分层强化学习，多智能体强化学习，大型语言模型
学科领域	人工智能
学科门类	工学::控制科学与工程
收录类别	其他
语种	中文
是否为代表性论文	是
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/57198
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	张清扬. 表示增强的深度强化学习算法研究[D],2024.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
表示增强的深度强化学习算法研究.pdf（37765KB）	学位论文		限制开放	CC BY-NC-SA