基于噪声对比估计的权重自适应对抗生成式模仿学习

doi:10.16451/j.cnki.issn1003-6059.202304002

摘要
图/表
参考文献
相关文章 (15)

全文: PDF (1850 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要传统模仿学习需满足专家样本均为质量极高的最优专家样本,这一限制条件既提高数据的采集难度也限制算法的应用场景.由此,文中提出基于噪声对比估计的权重自适应对抗生成式模仿学习算法(Weight Adaptive Generative Adversarial Imitation Learning Based on Noise Contrastive Estimation, GLANCE),在专家样本质量不一致的任务场景下可保持较高性能.首先,使用噪声对比估计训练特征提取器,改善次优专家样本特征分布.然后,为专家样本设定可学习权重系数,并对基于权重系数重分布后的样本执行对抗生成式模仿学习.最后,基于已知相对排序的评估数据计算排序损失,通过梯度下降法优化权重系数,改善数据分布.在多个连续控制型任务上的实验表明,专家样本质量不一致时,GLANCE仅需要获取专家样本数据集上5%数据作为评估数据集,就可以达到较优的性能表现.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	关伟凡
	张希

关键词 ：强化学习, 模仿学习, 噪声对比估计, 自适应权重

Abstract：The traditional imitation learning requires expert demonstrations of extremely high quality. This restriction not only increases the difficulty of data collection but also limits application scenarios of algorithms. To address this problem, weight adaptive generative adversarial imitation learning based on noise contrastive estimation(GLANCE) is proposed to maintain high performance in scenarios where the quality of expert demonstration is inconsistent. Firstly, a feature extractor is trained by noise contrastive estimation to improve the feature distribution of suboptimal expert demonstrations. Then, weight coefficients are set for the expert demonstrations, and generative adversarial imitation learning is performed on the expert demonstrations after redistribution based on the weight coefficients. Finally, ranking loss is calculated based on the known relative ranking evaluation data and weight coefficients are optimized through gradient descent to improve the data distribution. Experiments on multiple continuous control tasks show that GLANCE only needs to obtain 5% of the expert demonstrations dataset as evaluation data to achieve superior performance while the quality of the expert demonstration is inconsistent.

Key words： Reinforcement Learning Imitation Learning Noise Contrastive Estimation Adaptive Weight

收稿日期: 2022-11-25

ZTFLH:

TP 18

基金资助:科技创新2030——“新一代人工智能”重大项目(No.2020AAA0103400)资助

通讯作者: 张希,博士,副教授,主要研究方向为机器学习、强化学习.E-mail:xi.zhang@ia.ac.cn.

作者简介: 关伟凡,硕士研究生,主要研究方向为强化学习、模仿学习.E-mail:guanweifan2020@ia.ac.cn.

引用本文:

关伟凡, 张希. 基于噪声对比估计的权重自适应对抗生成式模仿学习[J]. 模式识别与人工智能, 2023, 36(4): 300-312. GUAN Weifan, ZHANG Xi. Weight Adaptive Generative Adversarial Imitation Learning Based on Noise Contrastive Estimation. Pattern Recognition and Artificial Intelligence, 2023, 36(4): 300-312.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.202304002 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2023/V36/I4/300