模式识别与人工智能
Monday, Apr. 7, 2025 Home      About Journal      Editorial Board      Instructions      Ethics Statement      Contact Us                   中文
Pattern Recognition and Artificial Intelligence  2025, Vol. 38 Issue (2): 177-191    DOI: 10.16451/j.cnki.issn1003-6059.202502006
Current Issue| Next Issue| Archive| Adv Search |
Hessian Aided Probabilistic Policy Gradient Method
HU Lei1, LI Yongqiang1, FENG Yu1, FENG Yuanjing1
1. College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023

Download: PDF (1138 KB)   HTML (1 KB) 
Export: BibTeX | EndNote (RIS)      
Abstract  Policy gradient methods in reinforcement learning are widely applied to continuous decision-making problems due to their generality. However, their practical performance is consistently constrained by low sample utilization caused by high gradient variance. In this paper, a Hessian-aided probabilistic policy gradient method(HAPPG) is proposed, and a bimodal gradient estimation mechanism is designed based on the probabilistic gradient estimator. Historical momentum is added to large-batch-size estimation to restrict optimization fluctuation of gradient descent, and variance-reduced estimation based on the Hessian-aided technique is constructed by introducing the second-order curvature information of the policy parameters into the small-batch-size estimation. Theoretical analysis demonstrates that HAPPG achieves an O(ϵ-3) sample complexity under non-convex optimization conditions, attaining the best convergence rate among the existing methods. Experimental results validate its superior performance across multiple benchmark control tasks. Furthermore, the Hessian-aided probabilistic policy gradient estimator is combined with the proximal policy optimization(PPO) by embedding the adaptive learning rate mechanism of Adam optimizer, resulting in HAP-PPO. HAP-PPO outperforms PPO, and the designed gradient estimator can be applied to further enhance mainstream reinforcement learning algorithms.
Key wordsMachine Learning      Reinforcement Learning      Policy Gradient      Variance Reduction     
Received: 20 November 2024     
ZTFLH: TP18  
Fund:National Natural Science Foundation of China(No.62073294,U2341216)
Corresponding Authors: LI Yongqiang, Ph.D., associate professor. His research interests include reinforcement learning and control theory.   
About author:: HU Lei, Master student. His research interests include reinforcement learning and intelligent games. FENG Yu, Ph.D., professor. His research interests include multi-agent games, deep reinforcement learning, optimal and robust control. FENG Yuanjing, Ph.D., professor. His research interests include medical image processing, machine vision and brain intelligence.
Service
E-mail this article
Add to my bookshelf
Add to citation manager
E-mail Alert
RSS
Articles by authors
HU Lei
LI Yongqiang
FENG Yu
FENG Yuanjing
Cite this article:   
HU Lei,LI Yongqiang,FENG Yu等. Hessian Aided Probabilistic Policy Gradient Method[J]. Pattern Recognition and Artificial Intelligence, 2025, 38(2): 177-191.
URL:  
http://manu46.magtech.com.cn/Jweb_prai/EN/10.16451/j.cnki.issn1003-6059.202502006      OR     http://manu46.magtech.com.cn/Jweb_prai/EN/Y2025/V38/I2/177
Copyright © 2010 Editorial Office of Pattern Recognition and Artificial Intelligence
Address: No.350 Shushanhu Road, Hefei, Anhui Province, P.R. China Tel: 0551-65591176 Fax:0551-65591176 Email: bjb@iim.ac.cn
Supported by Beijing Magtech  Email:support@magtech.com.cn