融合信息素机制的大规模多智能体协同技术

CASIA OpenIR > 毕业生 > 硕士学位论文

	融合信息素机制的大规模多智能体协同技术
	JiaHui Zhang
	2023-06
页数	84
学位类型	硕士
中文摘要	近年来，多智能体协同决策是人工智能的研究热点之一，其中，多智能体强化学习将深度强化学习算法和多智能体结合，取得了较大的研究进展和应用。而在大规模多智能体强化学习中，随着智能体数量的增加，智能体的联合状态和动作空间会呈指数级增长，且环境的动态变化也会更加复杂。这类问题不适用于一般数量的多智能体强化学习方法，需要进行特定的研究和探索。本研究利用信息素机制能有效表示群体态势、信息交换效率高等特点，设计了一种信息素机制，并将它与多智能体强化学习算法融合，为大规模多智能体的协同合作提供了一种新的思路和方法，具体的工作内容如下：设计了一种面向群体协同和信息共享的信息素机制。由于数量庞大，大规模的智能体之间独立通信的难度较大，而信息素对智能体之间的通信要求不高，还能表示群体的态势信息。本研究以智能体的不同动作为维度，将当前状态的不同动作对应的环境奖励作为该智能体当前产生的信息素，并在与环境大小相同的媒介中进行传播和衰减，附近的其他智能体在附近位置感知到该信息素，在信息素和局部观察的同时作用下作出决策，提高智能体的决策和学习效率，实现多智能体的信息共享和高效协同。设计了一种融合全局信息素的大规模多智能体强化学习框架GPQ。在大规模智能体环境中，智能体往往只能获得局部观察信息，全局信息素表示了全部智能体当前的态势信息，可以为智能体提供额外的信息。GPQ同时处理智能体的局部观察信息和全局信息素分布，并利用卷积注意力提取全局信息素的特征，让智能体高效地与周围智能体合作完成任务。设计了一种融合局部信息素的大规模多智能体强化学习框架LPQ。在实际情况中，全局信息素会面临智能体的信息素感知范围有限、全局信息素特征过多从而计算资源增加的问题。LPQ通过知识蒸馏的方法，将智能体在全局信息素输入下学习到的策略转移到局部信息素输入的场景中，让智能体在执行阶段既能用局部信息素输入来帮助自己进行决策，减少网络规模，又能保持从全局信息素输入中学习到的决策能力。经过实验证明，本研究提出的信息素机制和网络模型可以有效提升大规模多智能体的协同效率，在鲁棒性、数量拓展性等方面优于基准算法，在大规模多智能体对战中胜率更高，为大规模多智能体系统的协同提供了一种新的思路和方向。
英文摘要	In recent years, multi-agent collaborative decision-making is one of the research hotspots of artificial intelligence. Among them, multi-agent reinforcement learning combines deep reinforcement learning algorithms with multi-agents, and has achieved great research progress and applications. However, in large-scale multi-agent reinforcement learning, as the number of agents increases, the joint state and action space of agents will increase exponentially, and the dynamic changes of the environment will become more complex. This kind of problem is not suitable for the general number of multi-agent reinforcement learning methods, and needs specific research and exploration. In this paper, a pheromone mechanism is designed for multi-agent reinforcement learning algorithm, which is effective in group situation representation and efficient in information exchange. This paper provides a new idea and method for large-scale multi-agent cooperation. The detailed work is as follows: A pheromone mechanism for group collaboration and information sharing is designed. Due to the large number of agents, it is difficult to communicate independently among large-scale agents. Pheromones do not require much communication between agents, but also can represent the group's situation information. In this paper, the different actions of the agent are taken as the dimension, and the environmental reward corresponding to the different actions of the current state is taken as the pheromone that the agent currently produces, and propagates and attenuates in the media of the same size as the environment. Other nearby agents perceive the pheromone in the vicinity and make decisions under the simultaneous influence of pheromone and local observation. This enables information sharing and efficient collaboration among multi-agents, improves the agent's decision-making and learning efficiency. A large-scale multi-agent reinforcement learning framework for global pheromone (GPQ) is designed. In large-scale agent environment, the agent can only get local observation information, while the global pheromone represents the current situation information of all agents, which can provide additional information for the agent. The GPQ model deals with both the local observation information and the global pheromone distribution, and extracts the global pheromone features by using convolutional attention, allowing the agent to cooperate with the surrounding agents efficiently. A large-scale multi-agent reinforcement learning framework for local pheromone (LPQ) is designed. In practice, agent's pheromone perception range is usually limited, and too many global pheromone features will increase computing resources. Therefore, knowledge distillation is used to transfer the strategies learned by agents under global pheromone input to local pheromones. At the execution stage, the agent can use local pheromone input to make decision, and can keep the decision-making ability learned from global pheromone input. The experiment results show that the pheromone mechanism and network model proposed in this paper can effectively enhance the collaborative efficiency of large-scale multi-agents. The result shows superiority to the benchmark algorithm in robustness and quantity expansibility, and a higher winning rate in large-scale multi-agent battle. As a result, this paper provides a new idea and direction for the collaboration of large-scale multi-agent system.
关键词	大规模多智能体协同多智能体强化学习信息素机制知识蒸馏
语种	中文
七大方向——子方向分类	多智能体系统
国重实验室规划方向分类	多智能体决策
是否有论文关联数据集需要存交	否
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/52167
专题	毕业生_硕士学位论文
推荐引用方式 GB/T 7714	JiaHui Zhang. 融合信息素机制的大规模多智能体协同技术[D],2023.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
融合信息素机制的大规模多智能体协同技术.（5936KB）	学位论文		限制开放	CC BY-NC-SA