基于深度强化学习的群体协同决策方法研究

CASIA OpenIR > 毕业生 > 博士学位论文

	基于深度强化学习的群体协同决策方法研究
	吴士广
	2022-05-24
页数	136
学位类型	博士
中文摘要	群体智能是指许多个体（智能体）通过交互、协作涌现出复杂群体行为的一种智能形态，为很多极具挑战性问题提供了崭新的解决方案，在城市安防、应急救援、军事对抗等领域，具有广泛的应用前景和价值。群体协同决策是群体智能的一个重要问题，受到了诸多领域科研人员的关注。然而，由于群体环境复杂动态多变，提升群体协同决策能力的难度与复杂度较大。近年来深度强化学习由于其较强的自学习和探索能力，为群体协同决策问题提供了新思路。但现有的深度强化学习方法在提升合作、对抗等场景下的群体协同决策能力，仍有诸多问题和挑战。为此，本文以提升群体协同决策能力为研究目标，按照个体数量由少到多、群体任务由简单到复杂、群体协作对象由同构到异构的研究过程，围绕合作、对抗两类典型场景开展研究。首先从群体环境中局部观测、局部通信的特性出发，研究基于图神经网络与注意力机制的群体协同决策方法；其次针对复杂多任务的合作场景，研究基于领域知识与数据共同驱动的群体协作决策方法；再次针对群体对抗环境中对手策略不可知问题，研究基于关系图推理的群体协作对抗方法；最后针对群体对抗环境中多样性策略需求，研究基于认知驱动的群体多样性策略学习方法。本文的主要工作与创新点归纳如下： 1. 针对群体环境中局部观测、局部通信带来的信息动态多变问题，提出了基于图神经网络与注意力机制的群体协作决策方法。一方面，利用先验知识对观测信息进行分组，设计了观测分组注意力网络对分组后的信息进行分别地处理，以提高智能体在动态环境下处理动态变化信息能力。另一方面，基于图注意力机制，设计了意图通信网络，将智能体的观测意图进行传递，提高智能体对环境的理解，促进智能体之间合作。仿真结果表明该方法可以有效处理动态环境下的信息动态变化问题，提高智能体在动态环境下的协同决策能力。 2. 针对复杂多任务的群体协同策略难以学习问题，以多目标覆盖且连通保持任务为背景，提出了基于领域知识与数据共同驱动的群体协同决策方法。基于连通保持和目标覆盖的领域知识，设计了两阶段的奖励函数用于引导连通保持下的协同覆盖策略学习，逐步提升策略性能。此外，基于连通保持控制模型，设计了连通性保证的动作过滤器，以滤除导致群体通信链路断开的动作，从而保证协同覆盖策略中连通保持的可靠性。仿真和实物实验结果都表明该方法既能覆盖尽可能多的目标，又能保证群体的通信拓扑连通。 3. 针对群体对抗环境中对手策略不可知问题，提出了基于关系图推理的协作对抗决策方法。基于图注意力机制分别设计了智能体图推理和对手图推理，以提取智能体层面和对手层面的表征信息。此外，基于推理得到的对手层面表征通过内在奖励对对手的未来状态进行预测，从而有效处理对手动作空间未知问题。仿真结果表明该方法可以有效处理对手策略不可知问题，提高智能体在对抗环境下的协同决策能力。 4. 针对群体对抗环境中多样性策略难以学习问题，提出了基于认知驱动的群体多样性策略学习方法。基于智能体的局部轨迹设计了态势认知模块和自我认知模块。此外，分别设计了基于互信息理论的正则化项来保证态势认知学习和自我认知学习的有效性和准确性。最后，设计了认知参数化机制将智能体的两种认知编码成具有智能体个性的决策参数，以促进多样性策略生成。仿真结果表明提出方法能够学习有效的认知，并且学习到有效的多样性策略，从而提升学习速度和学习性能。总体而言，本文从提高群体在合作、对抗两类典型场景的协同决策能力出发，深入研究合作环境下信息动态多变性和复杂多任务的协同策略难以学习，以及对抗环境下对手策略不可知和多样性策略难以学习等问题，提出了一系列基于深度强化学习的群体协同决策方法，所取得的研究成果具有重要理论和实际应用价值。
英文摘要	Swarm intelligence is an intelligent form in which many individuals (agents) emerge complex swarm behaviors through interaction and collaboration. It provides new solutions to many challenging problems and has broad application prospect and value in urban security, emergency rescue, military confrontation and so on. Swarm collaborative decision-making is an important issue in swarm intelligence, which has been concerned by researchers in many fields. However, since swarm environment is complex and dynamic, it is difficult to improve the swarm collaborative decision-making ability. In recent years, due to its strong self-learning and exploration ability, deep reinforcement learning provides a new solution for swarm collaborative decision-making. However, existing deep reinforcement learning methods still have many problems and challenges in improving swarm collaborative decision-making ability in the scenarios of cooperation and antagonism. Therefore, aiming to improve the ability of swarm collaborative decision-making, swarm collaborative decision-making methods are studied according to the research process from less to more in the number of agents, from simple to complex in swarm tasks, and from homogeneous to heterogeneous in swarm collaboration objects, focusing on two typical scenarios, i.e., cooperation and antagonism, in this dissertation. Firstly, based on the characteristics of local observation and local communication in swarm environment, a swarm collaborative decision-making method based on graph neural network and attention mechanism is studied. Secondly, to deal with the complex multi-task cooperation scenario, a swarm collaborative decision-making method driven by domain knowledge and data is studied. Thirdly, to deal with the problem that opponent strategy is unknown in swarm antagonistic environment, a collaborative antagonism method based on relational graph reasoning is studied. Finally, according to the demand of diversity strategies in swarm antagonistic environment, a swarm diversity strategy learning method based on cognitive driving is studied. The main work and innovation points of this dissertation are summarized as follows: 1.To deal with the problem of dynamic and changing information caused by local observation and local communication in swarm environment, a swarm collaborative decision-making method based on graph neural network and attention mechanism is proposed. On one hand, the observation information is grouped using prior knowledge, and an observation grouping attention network is designed to process the grouped information separately to improve the ability of the agents to process dynamic information. On the other hand, based on the graph attention mechanism, an intentional communication network is designed to transmit the observation intentions of the agents, improve their understanding about the environment, and promote the cooperation among the agents. The simulation results indicate that the method can effectively handle the dynamic information and improve the collaborative decision-making ability of the agents in the dynamic and changing environment. 2.To deal with the problem that the swarm collaborative strategy for complex multi-tasks (multi-target coverage and connectivity maintenance) is difficult to learn, a swarm collaborative decision-making method based on domain knowledge and data is proposed. Based on the domain knowledge of connectivity maintenance and target coverage, a two-stage reward function is designed to guide the learning of collaborative coverage strategies under connectivity preservation, aiming to gradually improve the performance of the strategy. Furthermore, based on the connectivity maintenance model, a connectivity guaranteed action filter is designed to filter out the actions that lead to the disconnection of swarm communication links, aiming to ensure the swarm communication topology connected in collaborative coverage strategies. Both simulation and physical experiments show that the method can cover as many targets as possible and ensure the swarm communication topology connected. 3.To deal with the problem that the opponent strategy is unknown in swarm antagonistic environment, a swarm collaborative antagonism decision-making method based on relation graph reasoning is proposed. Based on the graph attention mechanism, an agent graph reasoning and opponent graph reasoning are designed to extract the representation information from the agent level and the opponent level. In addition, the opponent level representation is used to predict the next state of the opponents through intrinsic rewards, aiming to effectively deal with the case that the action space of the opponents is unknown. The simulation results demonstrate that the method can effectively deal with the problem that the opponent strategy is unknown and improve the collaborative decision-making ability of the agents in the antagonistic environment. 4.To deal with the problem that diverse strategy is difficult to learn in swarm antagonistic environment, a cognitive-oriented swarm diverse strategy learning method is proposed. Situational cognition and self-cognition are designed based on the local trajectory of each agent. In addition, two regularizations based on mutual information theory are designed to ensure the learned situational cognition and self-cognition effective and accurate. Finally, a cognition parameterization mechanism is designed to encode the two kinds of cognition of each agent into decision-making parameters with individuality to facilitate the generation of diverse strategies. The simulation results show that the proposed method can learn effective cognitions and diverse strategies to improve learning speed and performance. To sum up, starting from improving the swarm collaborative decision-making ability for the two typical scenarios of cooperation and antagonism, this dissertation proposes a series of swarm collaborative decision-making methods based on deep reinforcement learning to address the dynamic and changing information and the difficulty of learning collaborative strategies for complex multi-tasks in the cooperation environment, the unknowability of opponent strategies and the difficulty of learning diverse strategies in the antagonism environment. The research results obtained have important theoretical and practical application value.
关键词	群体系统协同决策深度强化学习多智能体强化学习图注意力网络
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/48716
专题	毕业生_博士学位论文
通讯作者	吴士广
推荐引用方式 GB/T 7714	吴士广. 基于深度强化学习的群体协同决策方法研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2022.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
基于深度强化学习的群体协同决策方法研究.（14260KB）	学位论文		限制开放	CC BY-NC-SA