CASIA OpenIR  > 毕业生  > 硕士学位论文
两人零和博弈中的对抗学习与适应算法研究
吴哲
2022-05-18
页数90
学位类型硕士
中文摘要

      机器博弈的目标是设计出在复杂博弈环境中能够自主思考和决策的智能体。
这是一个快速发展的交叉学科研究领域,持续吸收着来自博弈论、心理学、强化
学习、深度学习等领域的最新进展。两人零和博弈作为机器博弈的基本模型,由
于其较强的可解释性以及普遍的适用性,一直是人工智能研究的重要问题。围绕
两人零和博弈所开展的一系列研究近年来在围棋、德州扑克等问题中取得了里
程碑式的突破。尽管如此,在高复杂、强对抗的环境中设计出具有适应性的高效
学习算法依然是两人零和博弈问题中的关键挑战之一。本论文从环境构建和算
法创新两个方面针对两人零和博弈中的对抗学习与适应问题进行了系统性地研
究。本文的主要贡献总结如下:
      1.在博弈环境搭建方面,本论文针对两人零和博弈构建了一套涵盖离散动作
空间和连续动作空间的通用训练、评估平台。该环境克服了现有基准缺乏高性能
算法实现以及对两人博弈设定缺乏支持的局限,并配套了丰富的基线算法和完
善的评估指标。
       2.在博弈算法设计方面,针对现有方法中均衡求解法存在策略过于保守,无
法保证收益最大化以及对手建模方法存在建模困难、策略泛化性差的弊端,本论
文重点关注两人零和博弈中的适应性算法研究。本论文首先提出了两种风格多
样化的对手策略生成方案。在此基础上,受元学习思想启发,本文提出一种适应
性算法训练框架。该框架使用元策略更新方法,能够针对当前对手类型调整元模
型的网络权重以完成快速适应。
       3. 本论文在离散动作空间与连续动作空间下的多类复杂对抗场景中进行了
详尽测试,并进行了各个模块的消融实验与分析。大量实验结果表明,本论文所
提算法能够有效克服现有方法的弊端,实现针对未知风格对手的快速适应,从而
为两人零和博弈收益最大化求解提供了一种新思路。

英文摘要

    Computer Games aims to design agents who can think and make decisions  autonomously in complex gaming environments. It is a rapidly growing  interdisciplinary research field that continues to absorb the latest advances from game  theory, psychology, reinforcement learning, deep learning, and other fields. The two player zero-sum game, the fundamental model of computer games, has been an  important problem in artificial intelligence research due to its complete interpretability  and general applicability. A series of studies around two-player zero-sum games have  recently achieved landmark breakthroughs in problems such as Go and Texas Hold'em.  Nevertheless, designing efficient learning algorithms with adaptability in highly  complex, strongly adversarial environments remains one of the critical challenges in  the two-player zero-sum game problem. This paper systematically studied the  adversarial learning and adaptation problem in two-player zero-sum games in terms of  both environment construction and algorithm innovation. The main contributions of  this paper are summarized as follows.  
    1. In terms of game environment construction, this thesis constructs a general  training and evaluation platform covering both discrete and continuous action spaces  for two-player zero-sum games. This environment overcomes the limitations of  existing benchmarks in terms of lack of high-performance algorithmic implementations  and lack of support for two-player game settings. It is accompanied by a rich set of  baseline algorithms and well-developed evaluation metrics.  
    2. In terms of game algorithm design, this thesis focuses on the research of  adaptive algorithms in two-person zero-sum games in response to the disadvantages of  existing methods in which the equilibrium solution method is too conservative in its  strategy and cannot guarantee the maximum payoff and the opponent modeling method  is difficult to model and has poor strategy generalization. This thesis first proposes two  stylistically diverse opponent strategy generation solutions. Based on this, inspired by  the idea of meta-learning, this thesis proposes an adaptive algorithm training framework. The framework uses a meta-strategy update method that can adjust the network weights of the metamodel for the current opponent type to accomplish fast adaptation.  
    3. This thesis conducts exhaustive experiments and analysis of the ablation of each  module in multiple types of complex adversarial scenarios in discrete and continuous  action spaces. Extensive experimental results show that the proposed algorithm in this  thesis can effectively overcome the drawbacks of existing methods and achieve fast  adaptation for unknown style opponents, thus providing a new idea for the solution of  two-player zero-sum game gain maximization.   

 

关键词机器博弈 两人零和博弈 纳什均衡 对手建模 元学习
语种中文
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/48778
专题毕业生_硕士学位论文
智能系统与工程
推荐引用方式
GB/T 7714
吴哲. 两人零和博弈中的对抗学习与适应算法研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2022.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
两人零和博弈中的对抗学习与适应算法研究.(6758KB)学位论文 限制开放CC BY-NC-SA
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[吴哲]的文章
百度学术
百度学术中相似的文章
[吴哲]的文章
必应学术
必应学术中相似的文章
[吴哲]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。