基于噪声对比估计的权重自适应对抗生成式模仿学习

	基于噪声对比估计的权重自适应对抗生成式模仿学习
	关伟凡1,2 ; 张希 1
发表期刊	模式识别与人工智能
ISSN	1003-6059
	2023-04
卷号	36 期号:4 页码:300-312
文章类型	期刊论文
摘要	传统模仿学习需满足专家样本均为质量极高的最优专家样本,这一限制条件既提高数据的采集难度也限制算法的应用场景.由此,文中提出基于噪声对比估计的权重自适应对抗生成式模仿学习算法(Weight Adaptive Generative Adversarial Imitation Learning Based on Noise Contrastive Estimation, GLANCE),在专家样本质量不一致的任务场景下可保持较高性能.首先,使用噪声对比估计训练特征提取器,改善次优专家样本特征分布.然后,为专家样本设定可学习权重系数,并对基于权重系数重分布后的样本执行对抗生成式模仿学习.最后,基于已知相对排序的评估数据计算排序损失,通过梯度下降法优化权重系数,改善数据分布.在多个连续控制型任务上的实验表明,专家样本质量不一致时,GLANCE仅需要获取专家样本数据集上5%数据作为评估数据集,就可以达到较优的性能表现.
其他摘要	The traditional imitation learning requires expert demonstrations of extremely high quality. This restriction not only increases the difficulty of data collection but also limits application scenarios of algorithms. To address this problem, weight adaptive generative adversarial imitation learning based on noise contrastive estimation(GLANCE) is proposed to maintain high performance in scenarios where the quality of expert demonstration is inconsistent. Firstly, a feature extractor is trained by noise contrastive estimation to improve the feature distribution of suboptimal expert demonstrations. Then, weight coefficients are set for the expert demonstrations, and generative adversarial imitation learning is performed on the expert demonstrations after redistribution based on the weight coefficients. Finally, ranking loss is calculated based on the known relative ranking evaluation data and weight coefficients are optimized through gradient descent to improve the data distribution. Experiments on multiple continuous control tasks show that GLANCE only needs to obtain 5% of the expert demonstrations dataset as evaluation data to achieve superior performance while the quality of the expert demonstration is inconsistent.
关键词	强化学习模仿学习噪声对比估计自适应权重
收录类别	EI
语种	中文
七大方向——子方向分类	强化与进化学习
国重实验室规划方向分类	智能计算与学习
是否有论文关联数据集需要存交	否
文献类型	期刊论文
条目标识符	http://ir.ia.ac.cn/handle/173211/52280
专题	复杂系统认知与决策实验室_高效智能计算与学习
通讯作者	张希
作者单位	1.中国科学院自动化研究所模式识别国家重点实验室 2.中国科学院大学人工智能学院
第一作者单位	模式识别国家重点实验室
通讯作者单位	模式识别国家重点实验室
推荐引用方式 GB/T 7714	关伟凡,张希. 基于噪声对比估计的权重自适应对抗生成式模仿学习[J]. 模式识别与人工智能,2023,36(4):300-312.
APA	关伟凡,&张希.(2023).基于噪声对比估计的权重自适应对抗生成式模仿学习.模式识别与人工智能,36(4),300-312.
MLA	关伟凡,et al."基于噪声对比估计的权重自适应对抗生成式模仿学习".模式识别与人工智能 36.4(2023):300-312.

条目包含的文件		下载所有文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
基于噪声对比估计的权重自适应对抗生成式模（1849KB）	期刊论文	作者接受稿	开放获取	CC BY-NC-SA	浏览下载