Knowledge Commons of Institute of Automation,CAS
基于值分解优化的多智能体深度强化学习方法研究 | |
王凌霄![]() | |
2021-05-26 | |
页数 | 100 |
学位类型 | 硕士 |
中文摘要 | 随着深度学习算法的实际效果和上下游软硬件的综合水平的大幅提升,深度学习技术开始被运用在信息科学各个领域的交叉性前沿研究中。最近三年来,深度学习方法已经在多智能体强化学习领域中成功地进行了许多探索,多智能体深度强化学习已经成为人工智能领域最近几年以来发展最为迅速的子方向之一。多智能体深度强化学习方向上的算法可以依据其背景设计原理分为几个技术路线大类,其中主要的技术路线包括基于同步通信的方法、基于价值函数分解的方法等等。本文针对多主体、连续时间、即时决策和不稳定通信环境下存在的新型问题,讨论了现有几种算法的应用局限和研究潜力,结合强化学习、图神经网络等领域的最新研究成果,提出了基于价值函数分解的多智能体深度强化学习算法的改进方法。
|
英文摘要 | With the significant improvement in deep learning algorithms and high-performance hardware and software platform, deep learning has successfully implemented many cutting-edge research efforts in the field of multi-agent reinforcement learning, and deep multi-agent reinforcement learning has become one of the most rapidly developing subfield in the field of artificial intelligence in recent years. In the last three years, many algorithms and models have emerged in deep multi-agent reinforcement learning, and these algorithms can be classified into several broad categories based on their design principles, among which the main technical routes include synchronous communication-based methods, value function decomposition-based methods, and soon. This paper discusses the application limitations and research potentials of several existing algorithms for the new problems existing in multi-agent, continuous-time, real-time decision and unstable communication environments, combined with the latest research results in the fields of reinforcement learning and graph neural networks. An improved method of multi-agent deep reinforcement learning algorithm based on value function decomposition is presented. The improved method proposed in this paper has two main innovations. First, this paper introduces asynchronous historical observations in the execution phase of the value function decomposition algorithm, which reduces the bandwidth burden introduced by the synchronous communication mechanism and increases the external features required by the agents to make decisions at the current timestep. This algorithm improvement achieves a balance between decision performance and computational overhead of multi-agents in the execution phase and proposes a unified algorithm framework between the value function decomposition algorithm and the synchronous communication algorithm. Secondly, this paper introduces the relation mining of implicit graph in the learning phase of the value function decomposition algorithm, uses the attention mechanism to calculate the weight coefficients among the agents and obtain the corresponding adjacency matrix of the implicit graph, and performs the graph convolution operation of action value vectors on the implicit graph. This algorithm improvement allows the relationships between agents to be generated automatically without expert knowledge and introduces the graph neural network computational module into the action-value function aggregation part. In this paper, the above algorithm improvement is tested under different tasks in the StarCraft Multi-Agent Challenge environment and compared with the classical methods, and it is found that the algorithm proposed in this paper performs better than the classical algorithm in the experimental environment. |
关键词 | 深度强化学习 多智能体系统 价值函数分解算法 图神经网络 |
语种 | 中文 |
七大方向——子方向分类 | 决策智能理论与方法 |
文献类型 | 学位论文 |
条目标识符 | http://ir.ia.ac.cn/handle/173211/44697 |
专题 | 多模态人工智能系统全国重点实验室_复杂系统智能机理与平行控制团队 |
推荐引用方式 GB/T 7714 | 王凌霄. 基于值分解优化的多智能体深度强化学习方法研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2021. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | ||
基于值分解优化的多智能体深度强化学习方法(13415KB) | 学位论文 | 开放获取 | CC BY-NC-SA |
个性服务 |
推荐该条目 |
保存到收藏夹 |
查看访问统计 |
导出为Endnote文件 |
谷歌学术 |
谷歌学术中相似的文章 |
[王凌霄]的文章 |
百度学术 |
百度学术中相似的文章 |
[王凌霄]的文章 |
必应学术 |
必应学术中相似的文章 |
[王凌霄]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论