一种二阶TDError快速Q<i>(λ)</i>算法

摘要
图/表
参考文献
相关文章 (9)

全文: PDF (626 KB) HTML (0 KB)
输出: BibTeX | EndNote (RIS)

摘要 Q(λ)学习算法是一种结合值迭代与随机逼近的思想的基于模型无关的多步离策略强化学习算法.针对经典的Q(λ)学习算法执行效率低、收敛速度慢的问题，从TDError的角度出发，给出n阶TDError的概念，并将n阶TDError用于经典的Q(λ)学习算法，提出一种二阶TDError快速Q(λ)学习算法——SOE-FQ(λ)算法.该算法利用二阶TDError修正Q值函数，并通过资格迹将TDError传播至整个状态动作空间，加快算法的收敛速度.在此基础之上，分析算法的收敛性及收敛效率，在仅考虑一步更新的情况下，算法所要执行的迭代次数T主要指数依赖于11-γ、1ε.将SOE-FQ(λ)算法用于RandomWalk和MountainCar问题，实验结果表明，算法具有较快的收敛速度和较好的收敛精度.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	傅启明
	刘全
	孙洪坤
	高龙
	李瑾
	王辉

关键词 ：强化学习, 马尔科夫决策过程, 二阶TDError, 资格迹, Q(λ)算法

Abstract：Q(λ) algorithm is a classic model-free-based off policy reinforcement learning with multiple steps which combines the value iteration and stochastic approximation. Aiming at the low efficiency and slow convergence for traditional Q(λ) algorithm,the n-order TD Error is defined from the aspect of the TD Error which is used to the traditional Q(λ) algorithm,and a fast Q(λ) algorithm based on the second-order TD Error (SOE-FQ(λ)) is presented. The algorithm adjusts the Q value with the second-order TD Error and broadcasts the TD Error to the whole state-action space,which speeds up the convergence of the algorithm. In addition,the convergence rate is analyzed,and the number of iteration mainly depends on 11-γ、1ε under the condition of one-step update. Finally,the SOE-FQ(λ) algorithm is used to the random walk and mountain car,and the experimental results show that the algorithm has the faster convergence rate and better convergence performance.

Key words： Reinforcement Learning Markov Decision Process Second Order TD Error Eligibility Trace Q(λ) Algorithm

收稿日期: 2012-05-09

ZTFLH:

TP181

基金资助:国家自然科学基金项目(No.61070223，61103045，61272005，61170020)、江苏省自然科学基金项目(No.BK2012616)、江苏省高校自然科学研究项目(No.09KJA520002，09KJB520012)资助

作者简介: 傅启明，男，1985年生，博士研究生，主要研究方向为强化学习、贝叶斯推理、模式识别.刘全(通讯作者)，男，1969年生，教授，博士生导师，主要研究方向为智能信息处理、自动推理、机器学习.E-mail:quanliu@suda.edu.cn.孙洪坤，男，1988年生，硕士，主要研究方向为强化学习.高龙，男，1988年生，硕士，主要研究方向为强化学习、模式识别.李瑾，女，1986年生，硕士，主要研究方向为强化学习、机器人足球.王辉，男，1968年生，讲师，主要研究方向为人机交互、强化学习、软件工程.

引用本文:

傅启明，刘全，孙洪坤，高龙，李瑾，王辉. 一种二阶TDError快速Q(λ)算法[J]. 模式识别与人工智能, 2013, 26(3): 282-292. FU Qi-Ming,LIU Quan,SUN Hong-Kun,GAO Long,LI Jing,WANG Hui. A Fast Q(λ) Algorithm Based on Second-Order TD Error. , 2013, 26(3): 282-292.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/ 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2013/V26/I3/282