基于深度强化学习的群体协同决策关键问题研究 | |
王彗木![]() | |
2021-04 | |
Pages | 130 |
Subtype | 博士 |
Abstract | 群体智能起源于人类对群体性生物行为的观察和研究,因其分布性、简单性、灵活性和智能性等优势,被广泛用于搜索救援、城市安防以及智能交通等各个领域,是我国《新一代人工智能发展规划》中核心研究领域之一。然而,现实中的任务多为并发性或综合性任务,对群体协同决策能力有很高的要求。近年来兴起的深度强化学习方法由于其极强的学习与探索特性为群体协同决策能力提升提供了新的解决思路,但仍有诸多关键问题亟待解决。如在群体系统中,动态变化的局部观测信息使得智能体难以提取有效信息进行决策。此外,群体系统中复杂且时变的交互关系使得智能体难以适应。进一步地,群体环境中智能体通讯范围内大量可通讯对象会导致通讯冗余从而干扰智能体决策。 本文针对上述影响群体协同决策能力的动态局部观测信息、邻域关系以及 2. 针对复杂及时变交互关系处理问题,设计了新型软性注意力机制来处理智能体间的复杂交互关系,并提出了基于增强注意力机制的群体强化学习框架来处理时变交互关系。前者通过给不同子空间赋予不同权重系数,提取有效的深层次子空间特征,从而提高智能体处理复杂交互关系能力。后者通过融合图卷积网络与长短期记忆网络,在处理时变交互关系的同时还能保留智能体的隐式空间结构。仿真实验结果表明该框架有效提升了智能体对复杂时变邻域关系的提取能力,并加快了策略训练收敛速度。 3. 针对通信冗余问题,提出基于先验知识与认知差异的冗余通信剪枝方法。设计了先验知识将智能体分组,并采用图注意力机制对分组后智能体状态进行处理以获得跨群组高维特征。其次,基于以上跨群组高维特征,通过自动变分编码器得到智能体对环境认知的后验分布,并基于该后验分布用 Kullback-Leibler散度对冗余信息进行剪枝。最后再通过注意力机制对剪枝后的信息进行差异化处理。仿真实验结果表明该框架有效提升了智能体对冗余信息的剪枝能力及智能体的决策能力。 |
Other Abstract | Swarm intelligence originates from human observation and research on group biological behaviors. Its advantages of distribution, simplicity, flexibility, and robustness provide brand-new solutions and ideas for many challenging and complex problems. It is one of the core research fields in China’s ”New Generation Artificial Intelligence Development Plan”. However, the tasks in reality are mostly concurrent or comprehensive tasks, which have high requirements for swarm cooperative decision-making capabilities. Deep reinforcement learning (DRL) methods, which have emerged in recent years, provide an alternative scheme for improvement of the swarm cooperative decision-making behaviors due to their strong learning and exploration characteristics, but there are still many key issues that need to be resolved. For examples, in swarm In order to improve the swarm cooperative decision-making ability, this dissertation proposes a series of DRL based methods for the above-mentioned problems that affect the swarm cooperative decision-making ability, including dynamic local observation information, neighborhood relations, and redundant communication. The main work and novelties of this dissertation are summarized as follows: 2. To deal with the problem of complex and time-varying interactions of agents, a new soft attention mechanism is designed to deal with the complex interactions among the agents, and an attention enhanced reinforcement learning framework is proposed to handle time-varying interactions of the agents. The former extracts effective high-level |
Keyword | 群体系统 协同决策 多智能体系统 深度强化学习 图卷积网络 注 意力机制 |
Language | 中文 |
Sub direction classification | 强化与进化学习 |
Document Type | 学位论文 |
Identifier | http://ir.ia.ac.cn/handle/173211/44958 |
Collection | 综合信息系统研究中心_飞行器智能技术 |
Recommended Citation GB/T 7714 | 王彗木. 基于深度强化学习的群体协同决策关键问题研究[D]. 中国科学院大学. 中国科学院大学人工智能学院,2021. |
Files in This Item: | ||||||
File Name/Size | DocType | Version | Access | License | ||
Thesis.pdf(8945KB) | 学位论文 | 暂不开放 | CC BY-NC-SA |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment