面向兵棋推演的多智能体智能博弈决策算法研究

CASIA OpenIR > 毕业生 > 硕士学位论文

	面向兵棋推演的多智能体智能博弈决策算法研究
	余照科
	2023-12-02
页数	86
学位类型	硕士
中文摘要	近年来，随着深度强化学习在图像语音识别、自然语言处理等感知领域达到人类水准，人们开始将目光转向侧重于认知决策的智能决策技术，从2015年开始，智能决策技术在围棋、德州扑克、星际争霸等游戏领域取得了一系列突破，同时也在无人机控制、自主驾驶和机器人合作等领域取得了实际应用。将相关智能博弈技术应用到兵棋推演中除了可以有效加速军事决策循环，也可以借助兵棋推演中类型各异的想定研究智能博弈技术，这使得兵棋推演中的智能博弈技术研究已成为当下的热点问题。由于战场环境多样，兵棋推演的地图也种类繁多。因此针对兵棋推演的研究往往基于某个特定地形与固定种类、数量的算子（可操作单位）所构成的想定进行。本文首先介绍从数据流角度构建的面向兵棋推演的分布式并行强化学习训练平台，并展示以此为基础探索的分布式加速技术，而后介绍在此基础上由简入繁研究面向兵棋推演的智能决策技术的工作。本文在算子同构的中等起伏地想定中首先介绍了推理分析方法与兵棋推演的近似理论解，而后提出基于自博弈的深度强化学习算法，最后介绍了在搭建的平台中进行实验验证的结果。本文在算子异构的水网稻田想定中首先介绍了知识分析与建模、构建知识AI的过程，而后展示了以此为基础在搭建的平台中进行分阶段的混合驱动的深度强化学习训练过程与结果。本文的主要贡献点有两点：第一点为构建了面向兵棋推演的分布式并行强化学习训练平台，平台实现了程序并行、数据流优化等特性，从而增加了吞吐率并加速数据处理过程，从而可以显著加速深度强化学习的训练过程；第二点为提出了面向兵棋推演的智能决策方法，在算子同构的简单想定中可以通过改良的自博弈算法实现智能体智能水平的攀升，在算子异构的复杂想定中可以通过分阶段的混合驱动强化学习算法实现智能体决策水准的提高。基于上述训练平台和算法创新，我们提出的混合驱动算法在腾讯“开悟”王者荣耀邀请赛中取得复赛第四名，在第一届全国空中博弈大赛中取得三名。
英文摘要	In recent years, as deep reinforcement learning has reached human level in perception fields such as image speech recognition and natural language processing, people have begun to turn their attention to intelligent decision-making technology that focuses on cognitive decision-making. Since 2015, intelligent decision-making technology has been used in Go, Texas A series of breakthroughs have been made in games such as poker and StarCraft, as well as practical applications in areas such as drone control, autonomous driving, and robot cooperation. The application of relevant intelligent game technologies to wargames can not only effectively accelerate the military decision-making cycle, but also use different types of scenarios in wargames to study intelligent game technologies, which makes wargames easier to use. The research of intelligent game technology in deduction has become a hot issue at present. Due to the variety of battlefield environments, there are also various types of maps for wargaming deduction. Therefore, research on wargaming is often based on a scenario composed of a specific terrain and a fixed type and number of operators (operable units).This paper first introduces the distributed parallel reinforcement learning training platform for wargaming built from the perspective of data flow, and demonstrates the distributed acceleration technology explored on this basis, and then introduces the intelligence for wargaming from simple to complex research on this basis. Decision-making techniques work. This paper first introduces the reasoning analysis method and the approximate theoretical solution of wargames in the medium fluctuation scenario of operator isomorphism, then proposes a deep reinforcement learning algorithm based on self-game, and finally introduces the experimental verification results in the built platform. . This paper first introduces the process of knowledge analysis and modeling, and the construction of knowledge AI in the heterogeneous operator-heterogeneous water network and rice field scenario, and then demonstrates the staged hybrid-driven deep reinforcement learning training in the platform built on this basis. process and results. The main contributions of this paper are two points. The first point is to build a distributed parallel reinforcement learning training platform for wargames. The platform realizes features such as program parallelism and data flow optimization, thereby increasing the throughput and speeding up the data processing process. As a result, the training process of deep reinforcement learning can be significantly accelerated. The second point is to propose an intelligent decision-making method for wargames. In the simple scenario of operator isomorphism, the intelligence level of the agent can be improved through the improved self-game algorithm. In the complex scenario of operator heterogeneity, it can be divided into The hybrid-driven reinforcement learning algorithm of the stage realizes the improvement of the decision-making level of the agent. Based on the above training platform and algorithm innovation, the hybrid drive algorithm we proposed won the fourth place in the rematch in the Tencent "Kaiwu" King of Glory Invitational Tournament, and won the third place in the first national air game competition.
关键词	请输入关兵棋，智能决策，多智能体，深度强化学习，分布式训练键词
收录类别	其他
语种	中文
是否为代表性论文	是
七大方向——子方向分类	机器学习
国重实验室规划方向分类	人-机-算法混合与协同决策
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/50905
专题	毕业生_硕士学位论文
推荐引用方式 GB/T 7714	余照科. 面向兵棋推演的多智能体智能博弈决策算法研究[D],2023.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
毕业论文_普通.pdf（15273KB）	学位论文		限制开放	CC BY-NC-SA