CASIA OpenIR  > 毕业生  > 硕士学位论文
学习型群体博弈策略及其在兵棋推演中的应用
谢阳
Subtype硕士
Thesis Advisor范国梁
2019-05-21
Degree Grantor中国科学院大学
Place of Conferral中国科学院自动化研究所
Degree Name工程硕士
Degree Discipline控制工程
Keyword强化学习 兵棋推演 态势分析 蒙特卡洛搜索 人工智能
Abstract

随着计算机硬件技术的飞速发展,计算机博弈研究在传统棋类游戏上已经取得了丰厚的成果。兵棋推演作为重要的作战模拟方法,受到越来越多人的关注。本文以陆军合同战术兵棋的推演规则为基础,研究兵棋推演中的博弈对抗问题。通过分析兵棋推演的复杂度,提出一种知识数据混合驱动的分层决策模型。基于该模型的决策需求,提出一种兵棋推演战场态势的分析方法。结合兵棋推演的具体情况,改进蒙特卡洛搜索,提出一种蒙特卡洛搜索初值优化算法。基于本文提出的知识数据混合驱动的分层决策模型,分别采用蒙特卡洛搜索初值优化算法和Deep-Sarsa算法设计兵棋AI,通过实验验证模型和算法的有效性。本文的主要研究工作与创新点有:

1.提出一种基于兵棋推演的知识数据混合驱动的分层决策模型。该决策模型采用分层模式控制算子机动,机动控制器包括由数据和神经网络驱动的上层宏观控制器,以及由知识和规则驱动的下层解释控制器。上层宏观控制器输出宏观决策,解释控制器通过态势分析对解释宏观决策输出基本动作。其他动作如,夺控、射击和下车等由知识规则控制。

2.提出一种基于兵棋推演的战场态势分析方法,包括静态态势分析、实时态势分析和超实时态势分析。静态态势评估:利用均值漂移算法对地图的关键地形聚类,将战场分割成多个区域,分析这些区域间的连接关系可制定不同的机动方案。实时态势评估:提出一种量化单元格价值的方法,结合算子行动意图,计算单元格对行动意图的作战价值,选择作战价值最高单元格为最优目标点。超实时态势评估:分析敌方作战意图,结合静态态势评估中战场分割的结果和实时态势评估中生成最优目标点的方法,建立敌方可能位置点的集合,考虑集合中所有敌方位置情况,通过并行仿真搜索,预测战场态势。

3.提出一种基于兵棋推演的蒙特卡洛搜索初值优化算法。对比蒙特卡洛搜索和蒙特卡洛树搜索,分析蒙特卡洛搜索的缺点以及这些缺点的成因,结合兵棋推演的具体情况,改进了蒙特卡洛搜索。该算法结合了UCB公式和蒙特卡洛搜索,同时引入神经网络初始化UCB参数,通过UCB参数传递子节点的价值信息指导蒙特卡洛搜索。同时,神经网络根据蒙特卡洛搜索的结果学习新的经验知识,不断更新迭代。

4.设计了两款基于兵棋推演的强化学习AI。一款采用蒙特卡洛搜索初值优化算法,另一款采用Deep-Sarsa算法。通过迭代训练,基于蒙特卡洛搜索初值优化算法的AI以90%的胜率击败了以“CASIA-先知V1.0”为原型仿制的知识规则AI。基于Deep-Sarsa算法的AI同样以90%的胜率击败了“人机对抗全国挑战赛”中的亚军AI。

Other Abstract

With the rapid development of computer hardware technology, the research of computer games have made great achievements on traditional chess games. As an important tool for battle simulation, Wargame has received more and more people’s attention. This thesis is based on the deduction rules of military contract tactical chess, and research on the Game strategy in Wargame. By analyzing the complexity of Wargame, proposed a hierarchical decision-making model driven by knowledge and data. On account of the model’s decision demand, proposed a method to analysis Wargame situation. Considering the specific situation of Wargame, proposed an improved Monte Carlo Search algorithm with initial value optimization method. Using the improved Monte Carlo search algorithm and the Deep-Sarsa algorithm to design the Wargame AI respect to the hierarchical decision. Verify the validity of the model and algorithm via experiment. The main work and innovations of this thesis are as follows:

1.We proposed a hierarchical decision-making model driven by knowledge and data based on Wargame. The decision model adopts hierarchical mode control operators to maneuver. The maneuver controller includes upper macro controller driven by data and neural network, and lower interpretive controller driven by knowledge and rules. The upper macro-controller outputs macro-decisions, and the interpretive controller outputs the basic actions of macro-decision through situation analysis. Other actions such as occupying, shooting and unloading passenger are controlled by knowledge and rules.

2.We proposed a method to analysis battlefield situation based on Wargame, including static situation analysis, real-time situation analysis and ultra-real-time situation analysis. Static situation analyzing: use mean-shift algorithm to cluster the key terrain of maps. Divide the battlefield into several areas. And formulate different maneuvering schemes by analyzing the connection between these areas. Real-time situation analyzing: propose a method for quantifying cell value. Considering the moving intention of operators, calculate the combat value of the cell for the moving intention and select the cell with the highest combat value as the optimal target. ultra-real-time situation analyzing: Analyze enemy combat intention and then combining the results of battlefield segmentation in static situation analyzing and the method of generating optimal target in real-time situation assessment, Establish a set of possible enemy locations. Consider all enemy positions in the set. Predict the battlefield situation through parallel simulation.

3.We proposed an improved Monte Carlo Search algorithm with initial value optimization method. Compare Monte Carlo Search with Monte Carlo Tree Search and analyze the shortcomings of Monte Carlo Search and the causes of these shortcomings. Combining the Specific Situation of Wargame,  improve the Monte Carlo Search algorithm. This algorithm combines UCB formula and Monte Carlo search, and introduces neural network to initialize UCB parameters. The value information of sub-nodes is transmitted through UCB parameters to guide Monte Carlo search. At the same time, according to the results of Monte Carlo search, the neural network learns new knowledge and updating it’s parameters.

4.We designed two wargame AI based on reinforcement learning. One uses Monte Carlo Search algorithm with initial value optimization method, the other one  uses Deep-Sarsa algorithm. After training some iterations, AI based on Monte Carlo Search algorithm with initial value optimization method defeated AI based on rules with 90% win rate. AI based on Deep-Sarsa algorithm also defeated a rule-based AI which ranked second in the "National Wargame Competition" with 90% win rate.

Pages101
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/23928
Collection毕业生_硕士学位论文
Recommended Citation
GB/T 7714
谢阳. 学习型群体博弈策略及其在兵棋推演中的应用[D]. 中国科学院自动化研究所. 中国科学院大学,2019.
Files in This Item:
File Name/Size DocType Version Access License
学习型群体博弈策略及其在兵棋推演中的应用(7147KB)学位论文 暂不开放CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[谢阳]'s Articles
Baidu academic
Similar articles in Baidu academic
[谢阳]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[谢阳]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.