CASIA OpenIR  > 综合信息系统研究中心  > 飞行器智能技术
基于深度强化学习的群体协同决策关键问题研究
王彗木
2021-04
页数130
学位类型博士
中文摘要

      群体智能起源于人类对群体性生物行为的观察和研究,因其分布性、简单性、灵活性和智能性等优势,被广泛用于搜索救援、城市安防以及智能交通等各个领域,是我国《新一代人工智能发展规划》中核心研究领域之一。然而,现实中的任务多为并发性或综合性任务,对群体协同决策能力有很高的要求。近年来兴起的深度强化学习方法由于其极强的学习与探索特性为群体协同决策能力提升提供了新的解决思路,但仍有诸多关键问题亟待解决。如在群体系统中,动态变化的局部观测信息使得智能体难以提取有效信息进行决策。此外,群体系统中复杂且时变的交互关系使得智能体难以适应。进一步地,群体环境中智能体通讯范围内大量可通讯对象会导致通讯冗余从而干扰智能体决策。

      本文针对上述影响群体协同决策能力的动态局部观测信息、邻域关系以及
冗余通信等问题提出了一系列基于深度强化学习的群体协同决策方法:
1. 针对复杂动态环境下智能体动态局部观测信息问题,提出基于图卷积网络与长短期记忆网络的动态环境处理方法。一方面,将图卷积与注意力机制相融合,利用图卷积扩大智能体通信范围,并利用注意力机制差异化处理周围智能体状态,从而促进智能体合作。另一方面,引入长短期记忆网络,利用其时序关系处理能力对动态实体的空间结构进行映射,从而提升智能体处理动态局部观测信息能力。仿真实验结果表明该方法有效提升了智能体在动态环境下的协同决策能力。

2. 针对复杂及时变交互关系处理问题,设计了新型软性注意力机制来处理智能体间的复杂交互关系,并提出了基于增强注意力机制的群体强化学习框架来处理时变交互关系。前者通过给不同子空间赋予不同权重系数,提取有效的深层次子空间特征,从而提高智能体处理复杂交互关系能力。后者通过融合图卷积网络与长短期记忆网络,在处理时变交互关系的同时还能保留智能体的隐式空间结构。仿真实验结果表明该框架有效提升了智能体对复杂时变邻域关系的提取能力,并加快了策略训练收敛速度。

3. 针对通信冗余问题,提出基于先验知识与认知差异的冗余通信剪枝方法。设计了先验知识将智能体分组,并采用图注意力机制对分组后智能体状态进行处理以获得跨群组高维特征。其次,基于以上跨群组高维特征,通过自动变分编码器得到智能体对环境认知的后验分布,并基于该后验分布用 Kullback-Leibler散度对冗余信息进行剪枝。最后再通过注意力机制对剪枝后的信息进行差异化处理。仿真实验结果表明该框架有效提升了智能体对冗余信息的剪枝能力及智能体的决策能力。
      总体而言,本文从群体协同决策能力的提升出发,针对动态局部观测信息、邻域关系以及通信冗余等若干影响群体协同决策行为的关键问题,提出了一系列基于深度强化学习的群体协同决策方法,并通过一系列复杂的合作与对抗任务的仿真场景对所提方法进行了验证,为群体协同决策能力的提升作出了积极的应用探讨。

英文摘要

Swarm intelligence originates from human observation and research on group biological behaviors. Its advantages of distribution, simplicity, flexibility, and robustness provide brand-new solutions and ideas for many challenging and complex problems. It is one of the core research fields in China’s ”New Generation Artificial Intelligence Development Plan”. However, the tasks in reality are mostly concurrent or comprehensive tasks, which have high requirements for swarm cooperative decision-making capabilities. Deep reinforcement learning (DRL) methods, which have emerged in recent years, provide an alternative scheme for improvement of the swarm cooperative decision-making behaviors due to their strong learning and exploration characteristics, but there are still many key issues that need to be resolved. For examples, in swarm
systems, the dynamical local observation information makes it difficult for agents to extract effective features to make decisions. Besides, complex and time-varying interactions make it difficult for agents to adapt to environments. Furthermore, a large number of communicable objects within the communication range of each agent will cause communication redundancy and interfere with policy of agents.

In order to improve the swarm cooperative decision-making ability, this dissertation proposes a series of DRL based methods for the above-mentioned problems that affect the swarm cooperative decision-making ability, including dynamic local observation information, neighborhood relations, and redundant communication. The main work and novelties of this dissertation are summarized as follows:
1. To deal with the problem of dynamic local observation information, a local observation processing scheme based on graph convolutional network and long short-term memory network (LSTM) is proposed. On one hand, through the combination of graph convolution networks and attention mechanism, the states of neighbor agents of each agent are processed differently and the communication field of each agent is expanded. On the other hand, LSTM is introduced to use its temporal relationship processing ability to map the spatial structure of dynamic entities, thereby enhancing the agent’s ability to process dynamic local observation information. Simulation results indicate that the scheme effectively improves the agents’ cooperative decision-making ability in dynamic environments.

2. To deal with the problem of complex and time-varying interactions of agents, a new soft attention mechanism is designed to deal with the complex interactions among the agents, and an attention enhanced reinforcement learning framework is proposed to handle time-varying interactions of the agents. The former extracts effective high-level
hidden states by assigning different weight coefficients to different subspace, thereby promoting agent cooperation. The latter combines graph convolutional network and LSTM to process the time-varying interactions while retaining implicit spatial structure of the agents. The results of simulation show that the framework effectively improves the ability to process the complex and time-varying interactions, and speed up the convergence speed.
3. To deal with the problem of redundant communication, a new information pruning method based on prior knowledge and cognitive difference is proposed. Prior knowledge is adopted to cluster all the agents into different groups, and attention mechanism is used to extract different high-dimensional state representations of the different groups. Besides, the posterior distributions of the agents’ understanding on environments is
obtained with variational autoencoder based on the group state representations. Then Kullback-Leibler divergence is adopted to prune redundant information based on the posterior distribution. Finally, graph attention mechanism is used to process remaining information. The simulation demonstrates that the method can effectively improve the
ability of the agents to pruning redundant information, and can effectively improve the decision-making ability of swarm system. On the whole, starting from the improvement of swarm cooperative decision-making
ability, this dissertation proposes a series of improvement schemes for several key problems that affect swarm cooperative decision-making behaviors, including dynamic local observation information, neighborhood relations, and redundant communication. Besides, this dissertation validates the proposed methods and frameworks through a series of complex simulation scenarios, and makes a positive application discussion for the improvement of swarm cooperative decision-making ability.
 

关键词群体系统 协同决策 多智能体系统 深度强化学习 图卷积网络 注 意力机制
语种中文
七大方向——子方向分类强化与进化学习
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/44958
专题综合信息系统研究中心_飞行器智能技术
推荐引用方式
GB/T 7714
王彗木. 基于深度强化学习的群体协同决策关键问题研究[D]. 中国科学院大学. 中国科学院大学人工智能学院,2021.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
Thesis.pdf(8945KB)学位论文 暂不开放CC BY-NC-SA
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[王彗木]的文章
百度学术
百度学术中相似的文章
[王彗木]的文章
必应学术
必应学术中相似的文章
[王彗木]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。