Empirical Policy Optimization for n-Player Markov Games

CASIA OpenIR > 多模态人工智能系统全国重点实验室 > 深度强化学习

	Empirical Policy Optimization for n-Player Markov Games
	Yuanheng Zhu; Weifan Li; Mengchen Zhao; Jianye Hao; Dongbin Zhao
发表期刊	IEEE Transactions on Cybernetics
	2022
页码	doi={10.1109/TCYB.2022.3179775}
摘要	In single-agent Markov decision processes, an agent can optimize its policy based on the interaction with the environment. In multiplayer Markov games (MGs), however, the interaction is nonstationary due to the behaviors of other players, so the agent has no fixed optimization objective. The challenge becomes finding equilibrium policies for all players. In this research, we treat the evolution of player policies as a dynamical process and propose a novel learning scheme for Nash equilibrium. The core is to evolve one’s policy according to not just its current in-game performance, but an aggregation of its performance over history. We show that for a variety of MGs, players in our learning scheme will provably converge to a point that is an approximation to Nash equilibrium. Combined with neural networks, we develop an empirical policy optimization algorithm, which is implemented in a reinforcement-learning framework and runs in a distributed way, with each player optimizing its policy based on own observations. We use two numerical examples to validate the convergence property on small-scale MGs, and a pong example to show the potential on large games.
七大方向——子方向分类	机器博弈
国重实验室规划方向分类	开放博弈基础理论
是否有论文关联数据集需要存交	否
文献类型	期刊论文
条目标识符	http://ir.ia.ac.cn/handle/173211/51532
专题	多模态人工智能系统全国重点实验室_深度强化学习
推荐引用方式 GB/T 7714	Yuanheng Zhu,Weifan Li,Mengchen Zhao,et al. Empirical Policy Optimization for n-Player Markov Games[J]. IEEE Transactions on Cybernetics,2022:doi={10.1109/TCYB.2022.3179775}.
APA	Yuanheng Zhu,Weifan Li,Mengchen Zhao,Jianye Hao,&Dongbin Zhao.(2022).Empirical Policy Optimization for n-Player Markov Games.IEEE Transactions on Cybernetics,doi={10.1109/TCYB.2022.3179775}.
MLA	Yuanheng Zhu,et al."Empirical Policy Optimization for n-Player Markov Games".IEEE Transactions on Cybernetics (2022):doi={10.1109/TCYB.2022.3179775}.

条目包含的文件		下载所有文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
Empirical_Policy_Opt（1739KB）	期刊论文	作者接受稿	开放获取	CC BY-NC-SA	浏览下载