|Place of Conferral||中国科学院自动化研究所|
|Keyword||多智能体协同 关系网络 分层 热力图 可解释性|
多智能体强化学习领域面临着诸多未解决的问题，包括多智能体间的协同机制未知、博弈空间巨大造成算法学习训练缓慢、环境信息的稀疏性造成算法学习受阻、算法模型的可解释性差以及模型的泛化性能堪忧等问题。现阶段多智能体深度强化学习框架的协同方式多种多样，例如 CTDE（中心训练，分散执 行）架构的 MADDPG 算法以及 QMIX 算法，利用共享内存以及通过隐藏层通信来协同等方法。高效的协同机制能提升多智能体在决策中的综合表现，也能够使决策者更好地理解决策的依据。智能体所接收的环境信息通常具备两面性，一 方面智能体需要从复杂冗余的环境信息中提取有效信息以缩减探索空间的大小， 提升算法的学习速度；另一方面智能体可能会面临过于稀疏的环境和奖励信息， 严重阻碍了算法的学习训练。现阶段面对稀疏奖励问题较好的解决方法包括课程学习、奖励塑形、好奇心驱动以及分层强化学习等。可解释性和泛化性差是深度神经网络与生俱来的问题，在端到端的深度强化学习领域尤为突出，将人类知识融入到深度神经网络中一直是研究者们的目标，而知识数据混合驱动进行决策是近年来提出的一种思路，但是如何混合二者仍未有较为成功的案例。
The multi-agent decision-making problem is a research hotspot in the field of artificial intelligence in recent years, and it has been widely used in the fields of robot collaboration, dispatching and pushing systems, distributed systems, autonomous combat systems, resource management systems, and commodity recommendation. Existing research methods include planning and coordination, classical control, and reinforcement learning, among which the most successful and widely used method is deep reinforcement learning. Reinforcement learning models the agent decision problem as a Markov decision process, where the agent learns how to maximize the expected return or achieve a specific goal through interaction with the environment.
In recent years, deep reinforcement learning has achieved great success in solving various sequential decision-making problems, such as game AI, robot control, autonomous driving, and battlefield decision-making. The theories and algorithms of single-agent deep reinforcement learning have emerged in an endless stream in recent years, developing rapidly and maturely, while multi-agent deep reinforcement learning is in its infancy. Compared with single-agent reinforcement learning, multi-agent reinforcement learning is more complex and difficult, and the depth and breadth of problems far exceed single-agent reinforcement learning.
The field of multi-agent reinforcement learning faces many unsolved issues, including the unknown coordination mechanism between multi-agents, the slowness of algorithm learning and training caused by the huge game space, the sparseness of environmental information causing the algorithm learning to be hindered, the interpretability of the algorithm model is poor, and the the generalization performance of the model is poor and so on. At present, the multi-agent deep reinforcement learning framework has a variety of collaborative methods, such as the MADDPG algorithm and the QMIX algorithm of the CTDE (Central Training, Decentralized Execution) architecture, using shared memory and cooperating through hidden layer communication. Efficient coordination mechanism can improve the comprehensive performance of multi-agents in decision-making, and also enable decision-makers to better understand the basis of decision-making. The environmental information received by the agent usually has two aspects. On the one hand, the agent needs to extract effective information from the complex and redundant environmental information to reduce the size of the exploration space and improve the learning speed of the algorithm. On the other hand, the agent may face too sparse environment and reward information, which seriously hinders the learning and training of the algorithm. At present, the better solutions to the sparse reward problem include curriculum learning, reward shaping, curiosity driven and hierarchical reinforcement learning. Poor interpretability and generalization are inherent problems of deep neural networks, especially in the field of end-to-end deep reinforcement learning. Integrating human knowledge into deep neural networks has always been the goal of researchers, and knowledge-data hybrid-driven decision-making is an idea proposed in recent years, but there is still no successful case of how to mix the two.
Starting from the problem of single agent and multi-agent, based on the current mainstream multi-agent deep reinforcement learning algorithm, this thesis focuses on the collaborative hierarchical decision-making technology under the background of multiagent. The main contributions are summarized as follows :
(1) Multi-agent cooperation mechanism based on relational network. Aiming at the problems of data redundancy, low sample utilization, and submerged key collaborative information in the process of multi-agent strategy learning, a relational network module is designed to strengthen the information interaction and collaboration between multi-agents. Taking advantage of the high sensitivity of graph neural network and multi-head attention mechanism to non-Euclidean structural data, two relational networks are constructed to generate relational weights for multi-agent decision-making. The relationship information between multi-agents is integrated into the relationship weight to strengthen the tightness of the relationship between the agents, and effectively promote the cooperation and communication among the multi-agents.
(2) Multi-agent hierarchical decision-making architecture and mechanism. Aiming at the problems of poor interpretability of agent behavior strategies, huge game exploration space, and slow algorithm learning speed, several multi-agent hierarchical architectures and mechanisms are designed to reduce the algorithm exploration space, enhance the depth of collaboration between multiple agents and facilitate learning of macro strategies. Using prior knowledge to group actions to construct a macro-micro strategy hierarchical architecture, which effectively reduces the exploration space of the algorithm, and enables agents to cooperate at multiple strategy levels. A macro-policy implicit constraint optimization mechanism is proposed, which overcomes the difficulty of the agent in learning and exploring the macro-policy, and improves the depth and breadth of the agent's cognition of the decision space.
(3) Mixed policy decision mechanism based on the value function. Aiming at the problem of poor interpretability of the model and the diversity of input feature information, a mixed-policy decision-making mechanism based on the value function is designed to improve the agent's cognition and extraction depth of input feature information. The strategy output formed by mixing different feature inputs improves the learning ability of the model. At the same time, by analyzing the feature attention weight, we can understand the degree of attention paid to different information in agent decision-making, and improve the interpretability of the decision.
(4) Heuristic decision-making mechanism based on heatmap. Aiming at the poor interpretability and difficulty in learning and training of deep reinforcement learning decision-making models, an interpretable heatmap-based heuristic decision-making model is designed to improve the performance of traditional deep reinforcement learning algorithms. Integrating expert knowledge into the construction of the heatmap mechanism makes the decision output of the agent interpretable, and improves the learning and exploration capabilities of traditional deep reinforcement learning by imitating the initial empowerment of learning. This is a valuable exploration of the knowledge-data hybrid-driven decision-making model.
|Subject Area||人工智能理论 ; 计算机神经网络 ; 知识工程|
|MOST Discipline Catalogue||工学::控制科学与工程|
|张朋朋. 基于关系网络的多智能体协同分层决策技术[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2022.|
|Files in This Item:|
|Recommend this item|
|Export to Endnote|
|Similar articles in Google Scholar|
|Similar articles in Baidu academic|
|Similar articles in Bing Scholar|
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.