一种二阶TDError快速Q<i>(λ)</i>算法

Abstract
Figure/Table
References
Related Citation (9)

Download: PDF (626 KB) HTML (0 KB)
Export: BibTeX | EndNote (RIS)

Abstract Q(λ) algorithm is a classic model-free-based off policy reinforcement learning with multiple steps which combines the value iteration and stochastic approximation. Aiming at the low efficiency and slow convergence for traditional Q(λ) algorithm,the n-order TD Error is defined from the aspect of the TD Error which is used to the traditional Q(λ) algorithm,and a fast Q(λ) algorithm based on the second-order TD Error (SOE-FQ(λ)) is presented. The algorithm adjusts the Q value with the second-order TD Error and broadcasts the TD Error to the whole state-action space,which speeds up the convergence of the algorithm. In addition,the convergence rate is analyzed,and the number of iteration mainly depends on 11-γ、1ε under the condition of one-step update. Finally,the SOE-FQ(λ) algorithm is used to the random walk and mountain car,and the experimental results show that the algorithm has the faster convergence rate and better convergence performance.

Key words： Reinforcement Learning Markov Decision Process Second Order TD Error Eligibility Trace Q(λ) Algorithm

Received: 09 May 2012

ZTFLH:

TP181

	Service

	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	FU Qi-Ming
	LIU Quan
	SUN Hong-Kun
	GAO Long
	LI Jing
	WANG Hui

Cite this article:

FU Qi-Ming,LIU Quan,SUN Hong-Kun等. A Fast Q(λ) Algorithm Based on Second-Order TD Error[J]. , 2013, 26(3): 282-292.

URL:

http://manu46.magtech.com.cn/Jweb_prai/EN/ OR http://manu46.magtech.com.cn/Jweb_prai/EN/Y2013/V26/I3/282