CASIA OpenIR  > 毕业生  > 博士学位论文
基于深度强化学习的大规模群体智能决策方法研究
付清旭
2024-05-13
Pages184
Subtype博士
Abstract

群体智能是指在自然界或人工系统中,大量智能体通过交互作用和协作展现出超越个体智能限制的一种智能形态。与单个智能体相比,群体系统在鲁棒性、可扩展性、自适应性、工作效率等方面有着无可比拟的优势。不同问题背景下的群体在规模上有较大的差异,当大量个体汇聚在一起或小型群体合并到一起时,可形成大规模群体系统。大规模群体系统包含的智能体数量非常多,每个个体可与百级甚至千级数量的其他个体建立交互关系,复杂、多样化的群体交互关系让群体的时空覆盖范围、群智潜力、任务完成能力等都有飞跃式的提升。此外,大规模群体系统也更有利于降低系统设计、制造和维护的成本,因此在工农业生产、国防安全、应急救援等诸多领域都拥有广泛的应用前景。

然而,结合现实问题激发大规模群体系统的群智涌现能力是一项困难的挑战。近年来,深度强化学习凭借其强大的学习与探索能力,为解决大规模群体协同决策问题提供了新的研究方向。但现有的深度强化学习方法虽然在小规模群体问题中有良好的表现,但当拓展至智能体关系更为复杂的大规模群体系统时,有诸多局限性和挑战,包括高维观测信息的约简、稀疏奖励的处理、系统非平稳性、对有学习能力的对手群体的反制等问题。

本文将大规模群体系统作为研究对象,以提升群体在复杂任务环境中的协同决策能力为目标,系统地开展基于深度强化学习的大规模群体智能策略方法研究,解决大规模群体在协作策略训练过程中遇到的关键问题。主要工作与创新点如下:

(1)针对大规模群体环境中的观测维度灾难与信息约简难题,提出了一种适用于大规模群体系统的专注网络深度强化学习方法。该方法模仿人的专注认知过程建立新型的专注网络结构以及专注策略梯度模型,通过采用多种动机指标对观察到的大量环境实体进行评分,并对观测到的敌友实体信息进行排序、裁剪、聚合以提取用于决策的关键特征,实现更好的大规模群体信息约简效果。仿真结果表明该方法提高了大规模群体处理高维观测信息的能力,显著提升了群体协同决策的学习表现。

(2)针对目前群体强化学习中存在的稀疏奖励问题和策略可解释性差的问题,提出了面向大规模群体的分层协作图强化学习方法。该方法构建了一种分层的可拓展协作图模型,把群体中的智能体抽象为可拓展协作图中的一系列节点,并让智能体节点形成多个执行子任务的小组以提高联合策略探索效率;设计了一组协作图操作者,协作图操作者负责控制可拓展协作图的拓扑结构,从而动态地调控整个群体的协作行为,引导群体高效地探索适应环境的协作策略。仿真结果表明,该方法不仅提高了群体在稀疏奖励环境下的探索效率与协同决策能力,还借助可拓展协作图提升了群体协作策略的可解释性和可伸缩性。

(3)针对大规模异构群体在训练中的非平稳性问题,提出了一种通用型异构联盟群体深度强化学习方法。该方法充分利用群体策略迭代过程中产生的中间历史策略,建立一种使不同能力水平的异构智能体共同协作的异构联盟学习机制,降低非平稳性问题带来的负面影响;设计一种超网络结构,鼓励智能体主动根据队友的能力与特点采取不同的合作策略,提高智能体在大规模异构群体中的协作技能学习能力。仿真实验中该方法表现出优越的协作策略学习能力和良好的可伸缩性。

(4)针对以具备主动学习能力的群体为对手的多阵营大规模群体对抗问题,提出了基于模糊反馈的竞争型大规模群体策略学习方法。该方法构建了一种能够动态调整群体策略优化器配置的模糊反馈控制模型,可根据实时态势动态调节包括内在奖励在内的各种强化学习关键配置,达到压制对手、快速夺取竞争优势的效果;设计了一种基于贝叶斯优化的模糊逻辑自动优化算法,使群体能从多次对抗中自主总结经验,实现对敌方策略变化的快速响应,提高群体对竞争态势的感知能力与对敌方策略的反制能力。仿真实验表明该方法在动态多群体竞争环境中显著提高了群体的竞争能力,并且具有良好的可解释性和迁移能力。

(5)针对现有群体深度强化学习仿真环境与训练框架存在的仿真规模小、不支持多队伍对抗、训练效率低等缺点,开发了多种大型群体仿真平台与训练工具链。首先,设计了分布式联合进攻环境(Decentralised Collective Assault,简称DCA)和大规模群体仿真环境构建平台(Unreal-based Multi-Agent Playground,简称Unreal-MAP)两个大型群体仿真平台,具有异构智能体支持、多阵营队伍支持、环境设计可伸缩性等创新性功能。其次,构建了大规模群体混合训练框架(Hybrid Multi-Agent Platform,简称HMAP),该框架从多队伍、大规模群体深度强化学习研究的需求出发,并将框架-仿真环境-强化学习算法三部分解耦,从底层架构上支持多队伍群体动态对抗场景的环境仿真与策略训练,为大量群体仿真环境与群体深度强化学习算法提供了支撑。这些自主设计研发的仿真平台与训练工具链在开源社区广受认可并且有力地支撑了其他章节的算法研究。

总体而言,本研究立足于提升大规模群体的协同决策能力。深入探讨了在单群体协作和多群体对抗两种条件下的协同策略的学习难题。在此基础上,提出了一系列基于深度强化学习的大规模群体智能决策方法。这些研究成果不仅具有重要的理论意义,同时也具备实际应用的价值。

Other Abstract

Swarm intelligence refers to a form of intelligence in which many agents within natural or artificial systems emerge swarm behaviors and capabilities that transcend individual intelligence through interaction and collaboration. Compared to individual agents, swarm systems possess unparalleled advantages in robustness, scalability, adaptability, and work efficiency. Swarm sizes vary greatly under different problem backgrounds, and large-scale swarm systems can be formed from the gathering of a large number of individuals or the merging of relatively smaller swarms. These large-scale swarm systems include an immense number of agents, where each individual can establish interactions with many other individuals. The complex and diverse relationships within the swarm lead to a leap in terms of temporal-spatial coverage, collective potential, and task completion capabilities. Moreover, large-scale swarm intelligent systems are also conducive to reducing the costs of system design, manufacturing, and maintenance, therefore, they have extensive prospects and value in various fields. However, stimulating the emergent capabilities of large-scale swarm intelligence in real-world problems presents a formidable challenge. In recent years, deep reinforcement learning, with its powerful learning and exploration abilities, has provided a new research direction for solving large-scale swarm collaborative decision-making problems. While existing deep reinforcement learning methods perform well in small-scale swarm issues, they face numerous limitations and challenges when extended to large-scale systems with more complex agent-to-agent relationships, including the reduction of high-dimensional observation information, handling of sparse rewards, system instability, and countering adversarial swarms with learning capabilities. This work takes large-scale swarm systems as the research subject, aiming to enhance the collaborative decision-making abilities of swarms in complex task environments. It systematically carries out research on large-scale swarm intelligence based on deep reinforcement learning to address key issues encountered during the training of collaborative strategies for large-scale swarms. The main work and innovation points of this dissertation are summarized as follows:

(1) To deal with the problem of the curse of dimensionality and information reduction in large-scale swarm environments, a deep reinforcement learning method with a concentration network is proposed for large-scale swarm systems. This method mimics human cognitive processes of concentration by establishing a novel concentration network structure and a concentration policy gradient model. It scores a multitude of observed environmental entities using various motivational indices, and it ranks, clips, and aggregates observed friend or foe agent information to extract key features for decision-making, achieving improved information reduction in large-scale swarms. Simulation results show that the method improves the ability of large-scale swarms to process high-dimensional observation information and significantly improves the swarm's cooperative decision-making capability.

(2) To deal with the challenges of sparse reward problems and interpretability problems faced by current swarm reinforcement learning methods, A hierarchical Collaborative Graph reinforcement Learning (HCGL) method is proposed for large-scale swarms. This method abstracts agents into a series of graph nodes through a dynamic graph structure referred to as an Extensible Cooperation Graph (ECG) and a set of special individuals known as ECG operators, allowing agents to form multiple sub-task executing clusters. By driving the topological structure of the ECG, the method dynamically adjusts the swarm's collaborative behavior, guiding the swarm to efficiently explore cooperative strategies adapted to dynamic changes in the swarm environment. Simulation results show that the method effectively improves the swarm's exploration efficiency and collaborative decision-making ability in sparse reward environments.

(3) To deal with the non-stability issue in large-scale heterogeneous swarms during training, a general-purpose heterogeneous swarm deep reinforcement learning is proposed. The method fully utilizes the historical strategies generated during the iterative process of swarm policy training, establishes a heterogeneous league training mechanism that enables heterogeneous agents with different capability levels to collaborate, and reduces the negative impact of the non-stability issue. A hypernetwork structure is designed to encourage agents to actively adopt different cooperative policies based on their teammates' abilities and characteristics, improving the collaborative skill-learning ability of agents in large-scale heterogeneous teams. Simulation results show that the method exhibits superior performance and good scalability in collaborative swarm policy learning.

(4) To deal with the problem of large-scale multi-team competition problem, a fuzzy feedback competitive swarm strategy learning method is proposed, which is capable of suppressing the learning capability of enemy swarms and seizing competitive advantage. This method aims to suppress the enemy and seize competitive advantages by designing a fuzzy feedback control model that dynamically adjusts key settings such as intrinsic rewards in swarm reinforcement learning, along with a fuzzy logic optimization method based on Bayesian Optimization. This enables the swarm to summarize experiences from multiple historical confrontations, rapidly respond to changes in the adversary's strategy, and enhance the swarm's ability to suppress the opponent. Simulation results show that the method significantly improves the dynamic competitive capabilities of large-scale team reinforcement learning algorithms, with excellent interpretability and transferability.

(5) To address the shortcomings of current swarm deep reinforcement learning simulation environments and training frameworks, such as small simulation scales, lack of support for multi-team competition, and low training efficiency, a variety of large-scale swarm simulation platforms and training toolchains have been proposed. Firstly, two large-scale swarm simulation platforms, the distributed joint offensive environment and the large-scale swarm environment construction tool Unreal-MAP, have been designed, achieving leading levels in terms of support for heterogeneous agents, multi-faction teams, and scalable environment design. Secondly, a large-scale swarm training framework, HMAP, has been constructed. This framework starts from the needs of multi-team, large-scale deep reinforcement learning research, supports dynamic adversarial training of multi-teams with multiple algorithms from the underlying architecture, and decouples the framework-simulation environment-algorithms, supporting a vast array of swarm simulation environments and deep reinforcement learning algorithms. These self-designed and developed simulation platforms and training toolchains have been widely recognized by the open-source community and have effectively supported algorithm research in other chapters.

Overall, this research aims to enhance the collaborative decision-making capabilities of large-scale swarms. It delves into the challenges of learning collaborative strategies under single-team cooperation as well as multi-team competition. Based on this, a series of swarm intelligent decision-making methods for large-scale swarms using deep reinforcement learning have been proposed. These research findings not only have significant theoretical implications but also practical application value in areas such as urban security, emergency rescue, and military confrontation.

 

Keyword大规模,群体系统,协同,决策,深度强化学习,多智能体系统
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/56680
Collection毕业生_博士学位论文
Recommended Citation
GB/T 7714
付清旭. 基于深度强化学习的大规模群体智能决策方法研究[D],2024.
Files in This Item:
File Name/Size DocType Version Access License
2024-6-4183-付清旭学位论文.(39071KB)学位论文 限制开放CC BY-NC-SA
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[付清旭]'s Articles
Baidu academic
Similar articles in Baidu academic
[付清旭]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[付清旭]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.