Computer Games aims to design agents who can think and make decisions autonomously in complex gaming environments. It is a rapidly growing interdisciplinary research field that continues to absorb the latest advances from game theory, psychology, reinforcement learning, deep learning, and other fields. The two player zero-sum game, the fundamental model of computer games, has been an important problem in artificial intelligence research due to its complete interpretability and general applicability. A series of studies around two-player zero-sum games have recently achieved landmark breakthroughs in problems such as Go and Texas Hold'em. Nevertheless, designing efficient learning algorithms with adaptability in highly complex, strongly adversarial environments remains one of the critical challenges in the two-player zero-sum game problem. This paper systematically studied the adversarial learning and adaptation problem in two-player zero-sum games in terms of both environment construction and algorithm innovation. The main contributions of this paper are summarized as follows.
1. In terms of game environment construction, this thesis constructs a general training and evaluation platform covering both discrete and continuous action spaces for two-player zero-sum games. This environment overcomes the limitations of existing benchmarks in terms of lack of high-performance algorithmic implementations and lack of support for two-player game settings. It is accompanied by a rich set of baseline algorithms and well-developed evaluation metrics.
2. In terms of game algorithm design, this thesis focuses on the research of adaptive algorithms in two-person zero-sum games in response to the disadvantages of existing methods in which the equilibrium solution method is too conservative in its strategy and cannot guarantee the maximum payoff and the opponent modeling method is difficult to model and has poor strategy generalization. This thesis first proposes two stylistically diverse opponent strategy generation solutions. Based on this, inspired by the idea of meta-learning, this thesis proposes an adaptive algorithm training framework. The framework uses a meta-strategy update method that can adjust the network weights of the metamodel for the current opponent type to accomplish fast adaptation.
3. This thesis conducts exhaustive experiments and analysis of the ablation of each module in multiple types of complex adversarial scenarios in discrete and continuous action spaces. Extensive experimental results show that the proposed algorithm in this thesis can effectively overcome the drawbacks of existing methods and achieve fast adaptation for unknown style opponents, thus providing a new idea for the solution of two-player zero-sum game gain maximization.
修改评论