面向多任务和属性泛化的多智能体强化学习算法研究

CASIA OpenIR > 毕业生 > 硕士学位论文

	面向多任务和属性泛化的多智能体强化学习算法研究
	黄上京
	2024-05-20
页数	90
学位类型	硕士
中文摘要	深度强化学习将深度学习的表示学习能力与强化学习的决策制定能力结合起来，已经在诸如围棋、电子竞技和机器人控制等任务上超越了人类专家的表现。而在多智能体领域，多智能体强化学习展示了如何在复杂的互动环境中训练智能体进行有效的协作或竞争，这在协调无人机编队飞行、自动驾驶车辆的交通管理、以及资源分配等多智能体系统中具有重要意义。然而，现有多智能体强化学习研究普遍面临模型难以适应不同场景和问题的局限性，一方面缺乏不同任务下的自适应能力，另一方面缺乏不同属性组成下的动态调整能力。这严重制约了多智能体强化学习在现实世界中的应用，因为现实环境往往涉及多种任务需求和动态变化的智能体属性（速度、负载等能力）。为了突破这一瓶颈，本论文致力于通过算法创新，提升多智能体强化学习在多任务和属性泛化方面的能力，旨在开发出能够适应现实世界复杂性和多样性的智能协作系统。同时，本论文还通过实体机器人实验验证了所提出算法的实际应用价值，促进了多智能体强化学习从理论走向实践。在多任务学习方面，本论文提出了一种基于注意力机制的策略网络模型，该模型能够灵活地整合任务和观测信息，让智能体根据不同的任务需求动态调整策略。通过任务-实体Transformer架构和交叉注意力设计，模型实现了对不同任务和智能体数量变化的自适应能力。此外，本论文提出了基于遗憾的多任务学习机制，以自动平衡各任务的学习进程，确保模型在所有任务上都能取得良好表现，避免了单一任务主导学习的问题。同时，本论文还利用预训练语言模型处理任务描述，赋予智能体对任务间关联的先验理解，从而增强模型面对新任务的泛化能力。这一设计使得智能体能够在新任务中复用已有知识，大幅提升了学习效率和适应能力。对于属性泛化问题，本论文提出了基于历史交互的上下文推断机制，用以识别多智能体系统的属性组成隐变量。这一机制能够动态捕捉智能体属性的变化，为策略适应提供了关键信息。结合上下文的决策模块允许智能体在协作中实现更加精细化的行为适应，提升了多智能体系统的协作效率和鲁棒性。同时，本论文通过对比学习增强了上下文表征，使智能体能更精确地识别和协调各自角色，进一步提高了属性泛化的效果。该方法克服了传统方法难以应对属性变化的问题，为多智能体强化学习在变化环境中的应用提供了基础。为了全面评估所提出方法的有效性和实用性，本论文构建了一个基于RoboMaster EP机器人的实体多机器人实验平台，并在此平台上进行了广泛的实验。实验设计覆盖了不同智能体属性、数量和任务目标变化等多种情况，全面考察了算法的适应能力和性能表现。实验结果显示，本论文提出的算法在多个不同场景下都展现出了优越的性能，在适应性、稳定性和协作效率等方面显著优于现有基准方法。这一结果有力地证明了所提出算法在应对现实世界动态性和多样性方面的卓越潜力，为多智能体强化学习在实际中的应用提供了坚实的基础。
英文摘要	Deep reinforcement learning combines the representational learning capabilities of deep learning with the decision-making prowess of reinforcement learning, and has surpassed human expert performance in tasks such as Go, e-sports, and robotic control. In the realm of multi-agent systems, multi-agent reinforcement learning (MARL) has demonstrated how to effectively train agents to collaborate or compete in complex interactive environments. This is of significant importance in coordinating formation flights of unmanned aerial vehicles, managing traffic for autonomous vehicles, and allocating resources in multi-agent frameworks. However, current research in MARL commonly faces the limitation that models struggle to adapt to different scenarios and problems. This limitation severely restricts the application of MARL in the real world, as real environments often involve diverse task requirements and dynamically changing agent attributes. To break through this bottleneck, this thesis is dedicated to enhancing the capability of MARL in multi-task learning and attribute generalization through algorithmic innovation, with the goal of developing intelligent collaboration systems that can adapt to the complexity and diversity of the real world. Furthermore, this thesis validates the practical application value of the proposed algorithms through physical robot experiments, paving the way for the practical application of MARL. In terms of multi-task learning, this thesis introduces an attention-based policy network model capable of flexibly integrating task and observation information, allowing agents to dynamically adjust their strategies according to different task requirements. With a task-entity Transformer architecture and a cross-attention design, the model achieves adaptability to changes in tasks and the number of agents, greatly enhancing the flexibility and robustness of multi-agent systems in complex environments. Moreover, this thesis proposes a regret-based multi-task learning mechanism to automatically balance the learning progress across tasks, ensuring good performance across all tasks and avoiding the problem of learning being dominated by a single task. The thesis also utilizes pre-trained language models to process task descriptions, endowing agents with a prior understanding of inter-task relationships, thereby enhancing the model's generalization ability to new tasks. This design allows agents to reuse existing knowledge in new tasks, significantly improving learning efficiency and adaptability. For the issue of attribute generalization, this thesis proposes a context inference mechanism based on historical interactions, which identifies the latent variables of the attribute composition of the multi-agent system. This mechanism dynamically captures changes in agent attributes, providing key information for strategy adaptation. The decision-making module integrated with context allows agents to achieve more refined behavior adaptation in collaboration, improving the efficiency and robustness of the multi-agent system's cooperation. Furthermore, the thesis enhances the context representation through contrastive learning, enabling agents to more accurately recognize and coordinate their roles, further improving the effects of attribute generalization. This approach overcomes the difficulty traditional methods face in dealing with changes in attributes, providing a foundation for the application of MARL in changing environments. To comprehensively assess the effectiveness and practicality of the proposed methods, this thesis constructs a physical multi-robot experimental platform based on the RoboMaster EP robots and conducts extensive experiments on this platform. The experimental design covers various situations including changes in agent attributes, numbers, and task goals, thoroughly examining the adaptability and performance of the algorithm. The experimental results show that the algorithm proposed in this thesis exhibits superior performance under various change conditions and significantly outperforms existing baseline methods in adaptability, stability, and collaboration efficiency. These results powerfully demonstrate the outstanding potential of the proposed algorithm to cope with the dynamism and diversity of the real world, providing a solid foundation for the development of MARL in practical applications.
关键词	多智能体强化学习多任务强化学习多智能体属性组成泛化实体多机器人平台
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/57101
专题	毕业生_硕士学位论文
推荐引用方式 GB/T 7714	黄上京. 面向多任务和属性泛化的多智能体强化学习算法研究[D],2024.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
20240528-黄上京-面向多任务和属（15636KB）	学位论文		限制开放	CC BY-NC-SA