Learning Deep Decentralized Policy Network by Collective Rewards for Real-Time Combat Game

	Learning Deep Decentralized Policy Network by Collective Rewards for Real-Time Combat Game
	Peixi Peng1 ; Junliang Xing1 ; Lili Cao 1; Lisen Mu 2; Chang Huang 2
	2019
会议名称	International Joint Conference on Artificial Intelligence
会议日期	August 10-16, 2019
会议地点	Macao, China
摘要	The task of real-time combat game is to coordinate multiple units to defeat their enemies controlled by the given opponent in a real-time combat scenario. It is difficult to design a high-level Artificial Intelligence (AI) program for such a task due to its extremely large state-action space and real-time requirements. This paper formulates this task as a collective decentralized partially observable Markov decision process, and designs a Deep Decentralized Policy Network (DDPN) to model the polices. To train DDPN effectively, a novel two-stage learning algorithm is proposed which combines imitation learning from opponent and reinforcement learning by no-regret dynamics. Extensive experimental results on various combat scenarios indicate that proposed method can defeat different opponent models and significantly outperforms many state-of-the-art approaches.
关键词	Multi-agent Learning Deep Decentralized Policy Network Real-time Combat Game
收录类别	SCI
七大方向——子方向分类	决策智能理论与方法
文献类型	会议论文
条目标识符	http://ir.ia.ac.cn/handle/173211/26156
专题	复杂系统认知与决策实验室_智能系统与工程
通讯作者	Junliang Xing
作者单位	1.Institute of Automation, Chinese Academy of Sciences 2.Horizon Robotics
第一作者单位	中国科学院自动化研究所
通讯作者单位	中国科学院自动化研究所
推荐引用方式 GB/T 7714	Peixi Peng,Junliang Xing,Lili Cao,et al. Learning Deep Decentralized Policy Network by Collective Rewards for Real-Time Combat Game[C],2019.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
IJCAI19StarCraftFina（762KB）	会议论文		开放获取	CC BY-NC-SA	浏览