CASIA OpenIR  > 复杂系统认知与决策实验室  > 听觉模型与认知计算
A New Pre-Training Paradigm for Offline Multi-Agent Reinforcement Learning with Suboptimal Data
Meng Linghui1,2; Zhang Xi1,2; Xing Dengpeng1,2; Xu Bo1,2
2024-04
Conference NameEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
Conference Date2024.4.14-2024.4.19
Conference PlaceSeoul, Korea
Abstract

Offline multi-agent reinforcement learning (MARL) with pre-training paradigm, which uses a large quantity of trajectories for offline pre-training and online deployment, has become fashionable lately. While performing well on various tasks, conventional pre-trained decision-making models based on imitation learning typically require many expert trajectories or demonstrations, which limits the development of pre-trained policies in multi-agent case. To address this problem, we propose a new setting, where a multi-agent policy is pre-trained offline using suboptimal (non-expert) data and then tested online with the expectation of high rewards. In this practical setting inspired by contrastive learning, we propose YANHUI, a simple yet effective framework utilizing a well-designed reward contrast function for multi-agent policy representation learning from a dataset including various reward-level data instead of just expert trajectories. Furthermore, we enrich the multi-agent policy pre-training with mixture-of-experts to dynamically represent it. With the same quantity of offline StarCraft Multi-Agent Challenge datasets, YANHUI achieves significant improvements over offline MARL baselines. In particular, our method surprisingly competes in performance with earlier state-of-the-art approaches, even with 10% of the expert data used by other baselines and the rest replaced by poor data.

Sub direction classification多智能体系统
planning direction of the national heavy laboratory多智能体决策
Paper associated data
Document Type会议论文
Identifierhttp://ir.ia.ac.cn/handle/173211/57331
Collection复杂系统认知与决策实验室_听觉模型与认知计算
Affiliation1.Institute of Automation, Chinese Academy of Sciences
2.School of Artificial Intelligence, University of Chinese Academy of Sciences
First Author AffilicationInstitute of Automation, Chinese Academy of Sciences
Recommended Citation
GB/T 7714
Meng Linghui,Zhang Xi,Xing Dengpeng,et al. A New Pre-Training Paradigm for Offline Multi-Agent Reinforcement Learning with Suboptimal Data[C],2024.
Files in This Item: Download All
File Name/Size DocType Version Access License
yanhui_full_paper.pd(964KB)会议论文 开放获取CC BY-NC-SAView Download
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Meng Linghui]'s Articles
[Zhang Xi]'s Articles
[Xing Dengpeng]'s Articles
Baidu academic
Similar articles in Baidu academic
[Meng Linghui]'s Articles
[Zhang Xi]'s Articles
[Xing Dengpeng]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Meng Linghui]'s Articles
[Zhang Xi]'s Articles
[Xing Dengpeng]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: yanhui_full_paper.pdf
Format: Adobe PDF
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.