模式识别与人工智能
2025年4月5日 星期六   首 页     期刊简介     编委会     投稿指南     伦理声明     联系我们                                                                English
模式识别与人工智能  2023, Vol. 36 Issue (4): 300-312    DOI: 10.16451/j.cnki.issn1003-6059.202304002
论文与报告 最新目录| 下期目录| 过刊浏览| 高级检索 |
基于噪声对比估计的权重自适应对抗生成式模仿学习
关伟凡1,2, 张希1
1.中国科学院自动化研究所 模式识别国家重点实验室 北京 100190;
2.中国科学院大学 人工智能学院 北京 100049
Weight Adaptive Generative Adversarial Imitation Learning Based on Noise Contrastive Estimation
GUAN Weifan1,2, ZHANG Xi1
1. National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190;
2. School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049

全文: PDF (1850 KB)   HTML (1 KB) 
输出: BibTeX | EndNote (RIS)      
摘要 传统模仿学习需满足专家样本均为质量极高的最优专家样本,这一限制条件既提高数据的采集难度也限制算法的应用场景.由此,文中提出基于噪声对比估计的权重自适应对抗生成式模仿学习算法(Weight Adaptive Generative Adversarial Imitation Learning Based on Noise Contrastive Estimation, GLANCE),在专家样本质量不一致的任务场景下可保持较高性能.首先,使用噪声对比估计训练特征提取器,改善次优专家样本特征分布.然后,为专家样本设定可学习权重系数,并对基于权重系数重分布后的样本执行对抗生成式模仿学习.最后,基于已知相对排序的评估数据计算排序损失,通过梯度下降法优化权重系数,改善数据分布.在多个连续控制型任务上的实验表明,专家样本质量不一致时,GLANCE仅需要获取专家样本数据集上5%数据作为评估数据集,就可以达到较优的性能表现.
服务
把本文推荐给朋友
加入我的书架
加入引用管理器
E-mail Alert
RSS
作者相关文章
关伟凡
张希
关键词 强化学习模仿学习噪声对比估计自适应权重    
Abstract:The traditional imitation learning requires expert demonstrations of extremely high quality. This restriction not only increases the difficulty of data collection but also limits application scenarios of algorithms. To address this problem, weight adaptive generative adversarial imitation learning based on noise contrastive estimation(GLANCE) is proposed to maintain high performance in scenarios where the quality of expert demonstration is inconsistent. Firstly, a feature extractor is trained by noise contrastive estimation to improve the feature distribution of suboptimal expert demonstrations. Then, weight coefficients are set for the expert demonstrations, and generative adversarial imitation learning is performed on the expert demonstrations after redistribution based on the weight coefficients. Finally, ranking loss is calculated based on the known relative ranking evaluation data and weight coefficients are optimized through gradient descent to improve the data distribution. Experiments on multiple continuous control tasks show that GLANCE only needs to obtain 5% of the expert demonstrations dataset as evaluation data to achieve superior performance while the quality of the expert demonstration is inconsistent.
Key wordsReinforcement Learning    Imitation Learning    Noise Contrastive Estimation    Adaptive Weight   
收稿日期: 2022-11-25     
ZTFLH: TP 18  
基金资助:科技创新2030——“新一代人工智能”重大项目(No.2020AAA0103400)资助
通讯作者: 张 希,博士,副教授,主要研究方向为机器学习、强化学习.E-mail:xi.zhang@ia.ac.cn.   
作者简介: 关伟凡,硕士研究生,主要研究方向为强化学习、模仿学习.E-mail:guanweifan2020@ia.ac.cn.
引用本文:   
关伟凡, 张希. 基于噪声对比估计的权重自适应对抗生成式模仿学习[J]. 模式识别与人工智能, 2023, 36(4): 300-312. GUAN Weifan, ZHANG Xi. Weight Adaptive Generative Adversarial Imitation Learning Based on Noise Contrastive Estimation. Pattern Recognition and Artificial Intelligence, 2023, 36(4): 300-312.
链接本文:  
http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.202304002      或     http://manu46.magtech.com.cn/Jweb_prai/CN/Y2023/V36/I4/300
版权所有 © 《模式识别与人工智能》编辑部
地址:安微省合肥市蜀山湖路350号 电话:0551-65591176 传真:0551-65591176 Email:bjb@iim.ac.cn
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn