全局信息指导下的分布式多智能体协作算法研究

CASIA OpenIR > 毕业生 > 硕士学位论文

	全局信息指导下的分布式多智能体协作算法研究
	陈逸群
	2023-05
页数	93
学位类型	硕士
中文摘要	近年来，多智能体系统在智慧城市、仓储物流、无人机空战等场景下有着巨大的应用前景，多智能体协作问题也成为了研究重点。作为解决决策问题的重要方式，基于深度强化学习的多智能体协作算法在StarCraft II等实时战略游戏领域中使得智能体超过人类顶级玩家水平，也在德州扑克等领域打败职业玩家。虽然多智能体强化学习算法近年来得到了很大的发展，但是依然存在很多问题值得研究，本文重点研究如何在多智能体协作任务中充分利用全局信息，并且最终实现分散式执行。主要研究内容包括： 1. 基于智能体-特定化全局信息的多智能体协作算法研究针对“如何在决策时充分利用全局信息”的问题，提出了“指挥官-士兵”的多智能体协作算法。受到球类运动场景的启发，“指挥官-士兵”算法在多智能体协作任务中引入了指挥官的角色，并且把每个智能体建模为了士兵，此外还在士兵的决策信息中引入了“智能体-特定化的全局信息”。实验结果表明该算法能显著提高智能体的协作表现，智能体之间能形成更复杂的合作行为。 2. 基于个性化训练和特定化知识蒸馏的多智能体强化学习范式研究针对“指挥官-士兵”算法只能以集中式方式执行的问题，提出了基于个性化训练和特定化知识蒸馏的多智能体强化学习范式，使智能体在决策时既能受益于全局信息，又能实现分散式的执行。此范式进一步优化了“指挥官-士兵”算法中的智能体-特定化全局信息的生成网络，提升了集中式执行算法的表现；此外，还使用了特定化知识蒸馏的方式，仅使用局部观测信息对智能体-特定化的全局信息进行蒸馏，以实现完全分散式的执行。实验结果表明该范式在保留了集中式执行算法大部分性能的情况下，将集中式执行转变为了分散式执行，并且对不同的算法和场景都有很好的通用性。 3. 基于动态软邻域共识的多智能体协作算法研究在多智能体协作任务中，智能体之间的“共识”信息可以提高多智能体之间的协作表现。而当前对智能体共识的研究大多局限于智能体的邻域内，这对智能体共识信息的提取是一种限制。针对此问题，引入了“软邻域”的概念（“软邻域”即所有智能体两两形成的一组权重矩阵），并在基于个性化训练和特定化知识蒸馏的多智能体强化学习范式基础上，借助智能体-特定化的全局信息生成“软邻域”，最终基于“软邻域”优化智能体之间的共识信息，从而进一步提升智能体的协作表现。实验结果表明算法能较好地提取智能体之间的共识信息，并在多个测试指标上均有提升。本文的三部分研究内容是逐步递进的：第一部分主要探索和验证了“智能体-特定化全局信息”对多智能体协作表现的提升作用；第二部分除了进一步优化特定化全局信息的生成网络外，还使用知识蒸馏的方式实现了分散式执行；第三部分是基于特定化全局信息生成了“软邻域”，借助“软邻域”提取智能体之间的共识信息，以进一步提高智能体之间的协作表现。最终，能够在局部可观测的条件下，既受益于全局信息，又实现完全分散式执行，并且相比于当前主流的分散式执行算法在性能上有大幅度的提升。
英文摘要	In recent years, multi-agent system has a huge application prospect in smart city, warehousing and logistics, UAV air combat, and other scenarios. Multi-agent cooperation has become a research focus. As an important way to solve decision problems, multi-agent co-play algorithms based on deep reinforcement learning have enabled agents to surpass the level of top human players in real-time strategy games, such as StarCraft II, and beat professional players in areas such as Texas Hold 'Em. Although multi-agent reinforcement learning algorithm has made great progress in recent years, there are still many problems worth studying. This paper focuses on how to make full use of global information in multi-agent cooperative task, and finally realize decentralized execution. The main research contents include: 1. Research on multi-agent cooperative algorithm based on agents-specific global Information To solve the problem of "how to make full use of global information in decision making", a multi-agent cooperative algorithm of "Commander-Soldiers" is proposed. Inspired by the ball game scene, the "Commander-Soldiers" algorithm introduces the role of commander in the multi-agent cooperative task, and models each agent as a soldier. In addition, it also introduces "agent-specific global information" into the soldier's decision information. The results of experiment show that the algorithm can significantly improve the cooperative performance of agents, and more complex cooperative behaviors can be formed between agents. 2. Research on multi-agent reinforcement learning paradigm based on personalized training and specific knowledge distillation To solve the problem that the "Commander-Soldiers" algorithm can only be executed in a centralized way, a multi-agent reinforcement learning paradigm based on personalized training and specific knowledge distillation is proposed, so that the agent can not only benefit from the global information, but also realize the decentralized execution. This paradigm further optimizes the generation network of agent-specific global information in the "Commander-Soldiers" algorithm, and improves the performance of the centralized execution algorithm. In addition, the method of specific knowledge distillation is also used, in which only the local observation information is used to distill the agent-specific global information, so as to realize the fully decentralized execution. The results of experiment show that this paradigm can transform centralized execution into decentralized execution while retaining most of the performance of centralized execution algorithm, and has good universality to different algorithms and scenarios. 3. Research on multi-agent cooperative algorithm based on dynamic soft neighborhood consensus In multi-agent cooperative task, the "consensus" information between agents can improve the performance of multi-agent cooperation. However, most of the current researches on agent consensus are confined to the neighborhood of the agent, which is a limitation to the extraction of agent consensus information. To solve this problem, the concept of "soft neighborhood" is introduced ("soft neighborhood"is a set of weight matrix formed by all agents in pairs), and on the basis of multi-agent reinforcement learning paradigm based on personalized training and distillation of specific knowledge, "soft neighborhood"is generated with the help of agent-specific global information, and finally the consensus information between agents is optimized based on "soft neighborhood". Thus, the cooperative performance of agents can be further improved. The results of experiment show that the algorithm can extract the consensus information between agents well, and has improved on several test indexes. The research content of this paper is progressive in three parts: the first part mainly explores and verifies the role of "agent-specific global information" in improving the performance of multi-agent cooperation; The second part not only optimizes the generating network of specific global information, but also realizes the decentralized execution by mean of knowledge distillation. In the third part, a "soft neighborhood" is generated based on the specific global information, and consensus information between agents is extracted with the help of "soft neighborhood" to further improve the performance of cooperation between agents. Finally, under the condition of local observation, it can not only benefit from the global information, but also realize the fully decentralized execution, and the performance is greatly improved compared with the current mainstream decentralized execution algorithm.
关键词	强化学习，多智能体协作，全局信息，知识蒸馏
语种	中文
七大方向——子方向分类	多智能体系统
国重实验室规划方向分类	多智能体决策
是否有论文关联数据集需要存交	否
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/52206
专题	毕业生_硕士学位论文
推荐引用方式 GB/T 7714	陈逸群. 全局信息指导下的分布式多智能体协作算法研究[D],2023.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
陈逸群-硕士毕业论文（签字版）.pdf（42245KB）	学位论文		限制开放	CC BY-NC-SA