CASIA OpenIR  > 复杂系统管理与控制国家重点实验室  > 深度强化学习
基于深度强化学习的游戏智能决策
邵坤
Subtype博士
Thesis Advisor赵冬斌
2019-05-22
Degree Grantor中国科学院大学
Place of Conferral中国科学院自动化研究所
Degree Name工学博士学位
Degree Discipline控制理论与控制工程
Keyword深度强化学习 深度学习 强化学习 智能决策 游戏人工智能 多智能体系统
Abstract

人工智能(Artificial Intelligence, AI)的研究在近几年取得了巨大的进展。游戏作为人工智能合适的研究平台,吸引了众多研究人员测试新的算法和模型。游戏的智能决策是一个具有挑战性的研究领域,复杂条件下的游戏决策需要在不断地预测和评估中探索可行的策略。此外,游戏中的数据具有海量、高维、抽象的特点。将这些数据有效地表示并提取特征加以利用,从而制定出最优策略,是获胜的关键。与此同时,增强智能体对复杂游戏环境的理解认知与决策能力,是游戏智能决策的核心内容。在游戏人工智能中,感知和决策能力都是衡量一个智能体所蕴含智能的指标。以深度强化学习(Deep Reinforcement Learning, DRL)为代表的机器学习技术的快速发展,为游戏的智能决策提供了新的解决思路。近年来,基于深度强化学习的智能体在各类游戏中取得了令人瞩目的研究成果。从二维完全信息单智能体游戏,到三维不完全信息多智能体游戏,深度强化学习在这些游戏场景中都达到了人类玩家水平,并在围棋、星际争霸和刀塔(Dota2)等游戏中击败了顶尖职业选手。与此同时,以深度Q网络(Deep Q Network, DQN)和异步优势执行-评价(Asynchronous Advantage Actor-Critic, A3C)为代表的深度强化学习算法在基础理论和实际应用方面也都得到了进一步发展。

本文在综述游戏智能决策和深度强化学习研究现状的基础上,首先从完全信息单智能体棋类游戏出发,研究五子棋的落子预测;进而针对不完全信息单智能体感知决策问题,研究第一人称视角三维射击游戏的端到端决策控制;然后针对复杂动态环境中的多智能体学习控制问题,研究即时战略游戏星际争霸中多单元微操控制;最后针对完全合作下的多智能体信誉分配问题,研究抓捕游戏中多智能体的协同决策。

本文的主要工作和创新点包含如下几个方面:

1.在完全信息棋类游戏上,针对五子棋博弈问题,提出了五子棋落子预测模型,采用深度学习方法实现完全信息博弈状态下的五子棋落子预测。设计高效的网络模型和超参数,在五子棋数据集中达到了专业水平的预测准确率,验证了深度卷积神经网络对于棋谱数据的表征能力。

2. 在第一人称视角三维射击游戏上,针对高维图像输入和不完全信息问题,提出面向第一人称视角游戏的执行-评价深度强化学习端到端感知决策模型。通过融合多帧输入与历史时刻决策信息,解决部分可观测问题,同时利用并行多线程机制,训练智能体在多个游戏环境中优化决策,实现深度强化学习的稳步更新。最后在第一人称视角射击游戏的视觉导航和战斗射击任务中验证了所提方法的性能。

3. 在不完全信息多智能体即时战略游戏上,针对多单元协同决策和多场景泛化问题,提出结合课程迁移学习的共享参数多智能体强化学习算法,成功训练星际争霸微操单元战胜游戏内置智能体。针对复杂的状态空间,设计一种高效的状态表示方法来减小状态空间的维度,针对游戏中场景多样、单元种类繁多的问题,提出迁移学习预训练以及渐进式课程学习,提高学习速率和学习效果。最后在不同类型的微操场景中对比验证了所提方法的性能。

4. 在第一人称视角不完全信息多智能体游戏上,针对完全合作多智能体间的信誉分配问题,提出反事实回报深度强化学习方法。通过计算智能体采取其他动作导致的回报差异,进而确定每个智能体对于全局回报的贡献程度,有效解决了多智能体间的信誉分配问题。最后在不同大小迷宫场景中训练基于反事实回报的深度强化学习智能体,实现了深度强化学习方法在多智能体游戏上端到端的感知决策,验证了所提方法的有效性。

Other Abstract

Artificial intelligence (AI) research has made great progress in recent years. As suitable platforms for AI research, many researchers test their new algorithms and models in various games. Intelligent decision-making in games is a very challenging research field. Game decision-making under complicated conditions needs to constantly explore feasible policies with prediction and evaluation. In addition, the data in games is vast, high-dimensional and abstract. Representing the data effectively and extracting useful features, so as to work out the optimal strategy, is the key to win. At the same time, strengthening the agent's understanding and decision-making abilities of the complex game environment, is the core content of intelligent decision-making in games. In game AI, perception and decision-making abilities are used to measure the intelligence of an agent. With the rapid development of machine learning technology, especially deep reinforcement learning (DRL), intelligent decision-making in complex games with high-dimensional inputs becomes available. Recently, deep reinforcement learning agents have made remarkable achievements in a large number of games. From two-dimensional perfect-information single-agent games, to three-dimensional imperfect-information multi-agent games, deep reinforcement learning has reached human-level performance, and has defeated top professional players in Go, StarCraft and Dota2. At the same time, represented by deep Q network (DQN) and asynchronous advantage Actor-Critic (A3C), deep reinforcement learning also has obtained further development in aspects of theoretical basis and practical application.

In this thesis, we first review the current research status and related works of game AI and deep reinforcement learning. Then, we focuses on perfect-information board games, and studies the move prediction of Gomoku. After that, we study end-to-end control in imperfect-information first-person perspective shooter games. Thereafter, we focus on the multi-agent learning problem under complex dynamic environment, and study the micromanagement of multiple units in real-time strategy games StarCraft. Finally, we focus on the credit assignment problem in full cooperative multi-agent games, and study the cooperative decision-making of multiple agents in Predator and Prey. The main contents and contributions of this thesis are presented as follows.

1. In perfect-information board game Gomoku, we propose move prediction model for the Gomoku game problem, which uses deep learning method to predict expert moves under given perfect-information game states. By designing efficient network architecture and hyperparameters, our method achieves professional-level performance in RenjuNet dataset, and proves powerful representation ability of deep convolutional neural network on Gomoku dataset.

2. In first-person perspective shooter game ViZDoom, we propose Actor-Critic deep reinforcement learning method for the high-dimensional visual inputs and imperfect-information problem, and train agents to navigate and play against built-in game AI in end-to-end way. We solve the problem of partially observation by integrating multiple frames and historical decision-making information. Using parallel multi-thread mechanism, we synchronously train agents in multiple game environments, and optimize deep reinforcement learning policies steadily. Finally, we validate the proposed method in the visual navigation and battle tasks.

3. In partially-observable real-time strategy game StarCraft, for the cooperative decision-making and generalization problem, we propose parameter-sharing multi-agent reinforcement learning algorithm, and successfully train units to fight against built-in game AI. In order to tackle the complicated state space, the thesis puts forward an efficient state representation method to reduce the dimension of state space. To generalize well in multiple game scenarios and units, we resort to curriculum transfer learning to improve the learning speed and performance progressively. Finally, we validate the proposed method in various micromanagement scenarios.

4. In first-person perspective imperfect-information full cooperative multi-agent game Predator and Prey, we put forward counterfactual reward multi-agent deep reinforcement learning algorithm to solve the credit assignment problem among multiple agents. We achieve this by computing the reward difference in condition that agents take other actions, to determine the contribution of each agent for the global reward. Finally, we train multiple agents with counterfactual reward multi-agent deep reinforcement learning algorithm to make end-to-end cooperative behaviors, and validate the effectiveness of our proposed method.

shelfnum
Subject Area自动控制技术
MOST Discipline Catalogue工学::控制科学与工程
Pages122
DOI
URL查看原文
Language中文
Citation statistics
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/23946
Collection复杂系统管理与控制国家重点实验室_深度强化学习
Recommended Citation
GB/T 7714
邵坤. 基于深度强化学习的游戏智能决策[D]. 中国科学院自动化研究所. 中国科学院大学,2019.
Files in This Item:
File Name/Size DocType Version Access License
基于深度强化学习的游戏智能决策.pdf(13984KB)学位论文 开放获取CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[邵坤]'s Articles
Baidu academic
Similar articles in Baidu academic
[邵坤]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[邵坤]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.