海森辅助的概率策略梯度方法

doi:10.16451/j.cnki.issn1003-6059.202502006

Abstract
Figure/Table
References
Related Citation (15)

Download: PDF (1138 KB) HTML (1 KB)
Export: BibTeX | EndNote (RIS)

Abstract Policy gradient methods in reinforcement learning are widely applied to continuous decision-making problems due to their generality. However, their practical performance is consistently constrained by low sample utilization caused by high gradient variance. In this paper, a Hessian-aided probabilistic policy gradient method(HAPPG) is proposed, and a bimodal gradient estimation mechanism is designed based on the probabilistic gradient estimator. Historical momentum is added to large-batch-size estimation to restrict optimization fluctuation of gradient descent, and variance-reduced estimation based on the Hessian-aided technique is constructed by introducing the second-order curvature information of the policy parameters into the small-batch-size estimation. Theoretical analysis demonstrates that HAPPG achieves an O(

$\epsilon$ ^-3) sample complexity under non-convex optimization conditions, attaining the best convergence rate among the existing methods. Experimental results validate its superior performance across multiple benchmark control tasks. Furthermore, the Hessian-aided probabilistic policy gradient estimator is combined with the proximal policy optimization(PPO) by embedding the adaptive learning rate mechanism of Adam optimizer, resulting in HAP-PPO. HAP-PPO outperforms PPO, and the designed gradient estimator can be applied to further enhance mainstream reinforcement learning algorithms.

Key words： Machine Learning Reinforcement Learning Policy Gradient Variance Reduction

Received: 20 November 2024

ZTFLH:

TP18

Fund:National Natural Science Foundation of China(No.62073294,U2341216)

Corresponding Authors: LI Yongqiang, Ph.D., associate professor. His research interests include reinforcement learning and control theory.

About author:: HU Lei, Master student. His research interests include reinforcement learning and intelligent games. FENG Yu, Ph.D., professor. His research interests include multi-agent games, deep reinforcement learning, optimal and robust control. FENG Yuanjing, Ph.D., professor. His research interests include medical image processing, machine vision and brain intelligence.

	Service

	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	HU Lei
	LI Yongqiang
	FENG Yu
	FENG Yuanjing

Cite this article:

HU Lei,LI Yongqiang,FENG Yu等. Hessian Aided Probabilistic Policy Gradient Method[J]. Pattern Recognition and Artificial Intelligence, 2025, 38(2): 177-191.

URL:

http://manu46.magtech.com.cn/Jweb_prai/EN/10.16451/j.cnki.issn1003-6059.202502006 OR http://manu46.magtech.com.cn/Jweb_prai/EN/Y2025/V38/I2/177