基于关系网络的多智能体协同分层决策技术

CASIA OpenIR > 毕业生 > 硕士学位论文

	基于关系网络的多智能体协同分层决策技术
	张朋朋
	2022-05-20
页数	116
学位类型	硕士
中文摘要	多智能体决策问题是近年来人工智能领域的研究热点，其在机器人协作、派单推送系统、分布式系统、自主作战系统、资源管理系统、商品推荐等领域得到了广泛的应用。已有的研究方法包括规划协调、传统控制、以及强化学习，其中较为成功且应用广泛的就是深度强化学习。强化学习将智能体决策问题建模为马尔科夫决策过程，智能体通过与环境的交互学习如何实现最大化期望回报或实现特定的目标。近年来，深度强化学习在解决各种序贯决策问题中取得了较大的成功，如游戏AI、机器人控制、自动驾驶以及战场决策等。单智能体深度强化学习的理论和算法近些年层出不穷，发展迅速且较为成熟，而多智能体深度强化学习则处于起步的阶段。相比于单智能体强化学习，多智能体强化学习更为复杂而困难，在问题的深度和广度上都远超单智能体强化学习。多智能体强化学习领域面临着诸多未解决的问题，包括多智能体间的协同机制未知、博弈空间巨大造成算法学习训练缓慢、环境信息的稀疏性造成算法学习受阻、算法模型的可解释性差以及模型的泛化性能堪忧等问题。现阶段多智能体深度强化学习框架的协同方式多种多样，例如 CTDE（中心训练，分散执行）架构的 MADDPG 算法以及 QMIX 算法，利用共享内存以及通过隐藏层通信来协同等方法。高效的协同机制能提升多智能体在决策中的综合表现，也能够使决策者更好地理解决策的依据。智能体所接收的环境信息通常具备两面性，一方面智能体需要从复杂冗余的环境信息中提取有效信息以缩减探索空间的大小，提升算法的学习速度；另一方面智能体可能会面临过于稀疏的环境和奖励信息，严重阻碍了算法的学习训练。现阶段面对稀疏奖励问题较好的解决方法包括课程学习、奖励塑形、好奇心驱动以及分层强化学习等。可解释性和泛化性差是深度神经网络与生俱来的问题，在端到端的深度强化学习领域尤为突出，将人类知识融入到深度神经网络中一直是研究者们的目标，而知识数据混合驱动进行决策是近年来提出的一种思路，但是如何混合二者仍未有较为成功的案例。本文从单智能体和多智能体的场景问题出发，以现阶段主流的多智能体深度强化学习算法为基础，研究多智能体背景下的协同分层决策技术，主要工作和创新点现总结如下：（1）基于关系网络的多智能体协同机制。针对多智能体策略学习过程中数据冗余、样本利用率低、关键协同信息被淹没等问题，本文设计并构建了关系网络模块用于加强多智能体间的信息交互和协同性。利用图神经网络和多头注意力机制对非欧几里得结构数据的高敏感性特点，构造了两种关系网络用于生成多智能体间决策时的关系权重。将多智能体间的关系信息融入到关系权重中以加强智能体间关系的紧密度，有效地促进了多智能体间的协作和交流。（2）多智能体分层决策架构和机制。针对智能体行为策略可解释性差、博弈探索空间巨大、算法学习速度慢等问题，本文设计了几种多智能体的分层架构和机制用于缩减算法探索空间，提升多智能体间协同层次的深度，促进宏观策略的学习。利用先验知识对动作进行分组构建了宏微观的策略分层架构，有效地缩减了算法的探索空间，同时使得智能体能够在多个策略层次上进行协同合作。提出了一种宏策略隐约束优化机制，克服了智能体对宏策略学习和探索的困难，提升了智能体在决策空间认知上的深度和广度。（3）基于值函数的混合策略决策机制。针对模型可解释性差的问题和输入特征信息多样性的特点，本文设计了一种基于值函数的混合策略决策机制用于提升智能体对输入特征信息的认知和提取深度。通过混合不同特征输入形成的策略输出提升算法模型的学习能力，同时通过分析特征注意力权重理解智能体决策时对不同信息的注重程度，提升决策的可解释性。（4）基于热力图的启发式决策机制。针对现阶段深度强化学习决策模型可解释性差、学习训练困难等问题，本文设计并构建了一种可解释性的基于热力图的启发式决策模型，并将其用于改善传统深度强化学习算法的性能。将专家知识融入到热力图机制的构建中，使得智能体的决策输出具备可解释性，通过模仿学习初始赋能来提升传统深度强化学习的学习和探索能力，是对知识数据混合驱动决策模式的一种探索。
英文摘要	The multi-agent decision-making problem is a research hotspot in the field of artificial intelligence in recent years, and it has been widely used in the fields of robot collaboration, dispatching and pushing systems, distributed systems, autonomous combat systems, resource management systems, and commodity recommendation. Existing research methods include planning and coordination, classical control, and reinforcement learning, among which the most successful and widely used method is deep reinforcement learning. Reinforcement learning models the agent decision problem as a Markov decision process, where the agent learns how to maximize the expected return or achieve a specific goal through interaction with the environment. In recent years, deep reinforcement learning has achieved great success in solving various sequential decision-making problems, such as game AI, robot control, autonomous driving, and battlefield decision-making. The theories and algorithms of single-agent deep reinforcement learning have emerged in an endless stream in recent years, developing rapidly and maturely, while multi-agent deep reinforcement learning is in its infancy. Compared with single-agent reinforcement learning, multi-agent reinforcement learning is more complex and difficult, and the depth and breadth of problems far exceed single-agent reinforcement learning. The field of multi-agent reinforcement learning faces many unsolved issues, including the unknown coordination mechanism between multi-agents, the slowness of algorithm learning and training caused by the huge game space, the sparseness of environmental information causing the algorithm learning to be hindered, the interpretability of the algorithm model is poor, and the the generalization performance of the model is poor and so on. At present, the multi-agent deep reinforcement learning framework has a variety of collaborative methods, such as the MADDPG algorithm and the QMIX algorithm of the CTDE (Central Training, Decentralized Execution) architecture, using shared memory and cooperating through hidden layer communication. Efficient coordination mechanism can improve the comprehensive performance of multi-agents in decision-making, and also enable decision-makers to better understand the basis of decision-making. The environmental information received by the agent usually has two aspects. On the one hand, the agent needs to extract effective information from the complex and redundant environmental information to reduce the size of the exploration space and improve the learning speed of the algorithm. On the other hand, the agent may face too sparse environment and reward information, which seriously hinders the learning and training of the algorithm. At present, the better solutions to the sparse reward problem include curriculum learning, reward shaping, curiosity driven and hierarchical reinforcement learning. Poor interpretability and generalization are inherent problems of deep neural networks, especially in the field of end-to-end deep reinforcement learning. Integrating human knowledge into deep neural networks has always been the goal of researchers, and knowledge-data hybrid-driven decision-making is an idea proposed in recent years, but there is still no successful case of how to mix the two. Starting from the problem of single agent and multi-agent, based on the current mainstream multi-agent deep reinforcement learning algorithm, this thesis focuses on the collaborative hierarchical decision-making technology under the background of multiagent. The main contributions are summarized as follows : (1) Multi-agent cooperation mechanism based on relational network. Aiming at the problems of data redundancy, low sample utilization, and submerged key collaborative information in the process of multi-agent strategy learning, a relational network module is designed to strengthen the information interaction and collaboration between multi-agents. Taking advantage of the high sensitivity of graph neural network and multi-head attention mechanism to non-Euclidean structural data, two relational networks are constructed to generate relational weights for multi-agent decision-making. The relationship information between multi-agents is integrated into the relationship weight to strengthen the tightness of the relationship between the agents, and effectively promote the cooperation and communication among the multi-agents. (2) Multi-agent hierarchical decision-making architecture and mechanism. Aiming at the problems of poor interpretability of agent behavior strategies, huge game exploration space, and slow algorithm learning speed, several multi-agent hierarchical architectures and mechanisms are designed to reduce the algorithm exploration space, enhance the depth of collaboration between multiple agents and facilitate learning of macro strategies. Using prior knowledge to group actions to construct a macro-micro strategy hierarchical architecture, which effectively reduces the exploration space of the algorithm, and enables agents to cooperate at multiple strategy levels. A macro-policy implicit constraint optimization mechanism is proposed, which overcomes the difficulty of the agent in learning and exploring the macro-policy, and improves the depth and breadth of the agent's cognition of the decision space. (3) Mixed policy decision mechanism based on the value function. Aiming at the problem of poor interpretability of the model and the diversity of input feature information, a mixed-policy decision-making mechanism based on the value function is designed to improve the agent's cognition and extraction depth of input feature information. The strategy output formed by mixing different feature inputs improves the learning ability of the model. At the same time, by analyzing the feature attention weight, we can understand the degree of attention paid to different information in agent decision-making, and improve the interpretability of the decision. (4) Heuristic decision-making mechanism based on heatmap. Aiming at the poor interpretability and difficulty in learning and training of deep reinforcement learning decision-making models, an interpretable heatmap-based heuristic decision-making model is designed to improve the performance of traditional deep reinforcement learning algorithms. Integrating expert knowledge into the construction of the heatmap mechanism makes the decision output of the agent interpretable, and improves the learning and exploration capabilities of traditional deep reinforcement learning by imitating the initial empowerment of learning. This is a valuable exploration of the knowledge-data hybrid-driven decision-making model.
关键词	多智能体协同关系网络分层热力图可解释性
学科领域	人工智能理论 ; 计算机神经网络 ; 知识工程
学科门类	工学::控制科学与工程
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/48509
专题	毕业生_硕士学位论文
推荐引用方式 GB/T 7714	张朋朋. 基于关系网络的多智能体协同分层决策技术[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2022.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
基于关系网络的多智能体协同分层决策技术.（9752KB）	学位论文		限制开放	CC BY-NC-SA