模式识别与人工智能
Sunday, Jul. 27, 2025 Home      About Journal      Editorial Board      Instructions      Ethics Statement      Contact Us                   中文
  2013, Vol. 26 Issue (3): 282-292    DOI:
Orignal Article Current Issue| Next Issue| Archive| Adv Search |
A Fast Q(λ) Algorithm Based on Second-Order TD Error
FU Qi-Ming1,LIU Quan1,2,SUN Hong-Kun1,GAO Long1,LI Jing1,WANG Hui1
1. School of Computer Science and Technology,Soochow University,Suzhou 215006
2. Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education,Jilin University,Changchun 130012

Download: PDF (626 KB)   HTML (0 KB) 
Export: BibTeX | EndNote (RIS)      
Abstract  Q(λ) algorithm is a classic model-free-based off policy reinforcement learning with multiple steps which combines the value iteration and stochastic approximation. Aiming at the low efficiency and slow convergence for traditional Q(λ) algorithm,the n-order TD Error is defined from the aspect of the TD Error which is used to the traditional Q(λ) algorithm,and a fast Q(λ) algorithm based on the second-order TD Error (SOE-FQ(λ)) is presented. The algorithm adjusts the Q value with the second-order TD Error and broadcasts the TD Error to the whole state-action space,which speeds up the convergence of the algorithm. In addition,the convergence rate is analyzed,and the number of iteration mainly depends on 11-γ、1ε under the condition of one-step update. Finally,the SOE-FQ(λ) algorithm is used to the random walk and mountain car,and the experimental results show that the algorithm has the faster convergence rate and better convergence performance.
Key wordsReinforcement Learning      Markov Decision Process      Second Order TD Error      Eligibility Trace      Q(λ) Algorithm     
Received: 09 May 2012     
ZTFLH: TP181  
Service
E-mail this article
Add to my bookshelf
Add to citation manager
E-mail Alert
RSS
Articles by authors
FU Qi-Ming
LIU Quan
SUN Hong-Kun
GAO Long
LI Jing
WANG Hui
Cite this article:   
FU Qi-Ming,LIU Quan,SUN Hong-Kun等. A Fast Q(λ) Algorithm Based on Second-Order TD Error[J]. , 2013, 26(3): 282-292.
URL:  
http://manu46.magtech.com.cn/Jweb_prai/EN/      OR     http://manu46.magtech.com.cn/Jweb_prai/EN/Y2013/V26/I3/282
Copyright © 2010 Editorial Office of Pattern Recognition and Artificial Intelligence
Address: No.350 Shushanhu Road, Hefei, Anhui Province, P.R. China Tel: 0551-65591176 Fax:0551-65591176 Email: bjb@iim.ac.cn
Supported by Beijing Magtech  Email:support@magtech.com.cn