模式识别与人工智能
Wednesday, Jul. 30, 2025 Home      About Journal      Editorial Board      Instructions      Ethics Statement      Contact Us                   中文
Pattern Recognition and Artificial Intelligence  2024, Vol. 37 Issue (11): 1022-1032    DOI: 10.16451/j.cnki.issn1003-6059.202411007
Researches and Applications Current Issue| Next Issue| Archive| Adv Search |
Offline Reinforcement Learning Algorithm Based on Selection of High-Quality Samples
HOU Yonghong1, DING Wang1, REN Yi2, DONG Hongwei2, YANG Songling1
1. School of Electrical and Information Engineering, Tianjin University, Tianjin 300072;
2. National Key Laboratory of Space Integrated Information System, Institute of Software, Chinese Academy of Sciences, Beijing 100190

Download: PDF (1530 KB)   HTML (1 KB) 
Export: BibTeX | EndNote (RIS)      
Abstract  To address the issue of over-reliance on the quality of dataset samples of offline reinforcement learning algorithms, an offline reinforcement learning algorithm based on selection of high-quality samples(SHS) is proposed. In the policy evaluation stage, higher update weights are assigned to the samples with advantage values, and a policy entropy term is added to quickly identify high-quality action samples with high probability within the data distribution, thereby screening out more valuable action samples. In the policy optimization stage, SHS aims to maximize the normalized advantage function while maintaining the policy constraints on the actions within the dataset. Consequently, high-quality samples can be efficiently utilized when the sample quality of the dataset is low, thereby improving the learning efficiency and performance of the strategy. Experiments show that SHS performs well on D4RL offline dataset in the MuJoCo-Gym environment and successfully screens out more valuable samples, thus its effectiveness is verified.
Key wordsReinforcement Learning      Offline Reinforcement Learning      Distribution Shift      Policy Constraint      Value Function      Sample Selection     
Received: 23 August 2024     
ZTFLH: TP 18  
Corresponding Authors: REN Yi, Ph.D., senior engineer. His research interests include reinforcement learning and intelligent game.   
About author:: HOU Yonghong, Ph.D., professor. His research interests include computer vision, video and image processing, and digital communication.DING Wang, Master student. His research interests include offline reinforcement learning and artificial intelligence.DONG Hongwei, Ph.D., assistant professor. His research interests include machine learning and pattern recognition.YANG Songling, Master student. His research interests include reinforcement lear-ning and artificial intelligence.
Service
E-mail this article
Add to my bookshelf
Add to citation manager
E-mail Alert
RSS
Articles by authors
HOU Yonghong
DING Wang
REN Yi
DONG Hongwei
YANG Songling
Cite this article:   
HOU Yonghong,DING Wang,REN Yi等. Offline Reinforcement Learning Algorithm Based on Selection of High-Quality Samples[J]. Pattern Recognition and Artificial Intelligence, 2024, 37(11): 1022-1032.
URL:  
http://manu46.magtech.com.cn/Jweb_prai/EN/10.16451/j.cnki.issn1003-6059.202411007      OR     http://manu46.magtech.com.cn/Jweb_prai/EN/Y2024/V37/I11/1022
Copyright © 2010 Editorial Office of Pattern Recognition and Artificial Intelligence
Address: No.350 Shushanhu Road, Hefei, Anhui Province, P.R. China Tel: 0551-65591176 Fax:0551-65591176 Email: bjb@iim.ac.cn
Supported by Beijing Magtech  Email:support@magtech.com.cn