基于优质样本筛选的离线强化学习算法
侯永宏1, 丁旺1, 任懿2, 董洪伟2, 杨松领1

Offline Reinforcement Learning Algorithm Based on Selection of High-Quality Samples
HOU Yonghong1, DING Wang1, REN Yi2, DONG Hongwei2, YANG Songling1
有/无策略熵惩罚项时SHS的标准平均回报曲线