一种结合TileCoding的平均奖赏强化学习算法<sup>*</sup>

摘要
图/表
参考文献
相关文章 (8)

全文: PDF (516 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要平均奖赏强化学习是强化学习中的一类重要的非折扣最优性框架,目前大多工作都主要是在离散域进行.本文尝试将平均奖赏强化学习算法和函数估计结合来解决连续状态空间的问题,并根据状态域的改变,相应修改R-learning和G-learning中参数的更新条件.此外对结合函数估计的G-learning算法的性能表现及其对各种参数的敏感程度进行针对性研究.最后给出实验结果及分析.实验结果证明R-learning和G-learning在ε较小的情况下解容易发散,同时也说明特征抽取方法Tile Coding的有效性,且可作为其它特征抽取方法的参考标准.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	王巍巍
	陈兴国
	高阳

关键词 ：强化学习, 马尔可夫决策过程(MDP), R-学习, G-学习, 平均奖赏

Abstract：Average reward reinforcement learning is an important undiscounted optimality framework. However, most of the work was based on discrete state space tasks. In this paper, how to combine function approximation with average reward learning is studied, and the parameter update condition is modified according to the continuous space. Besides, a close study on the performance of G-learning and its insensitivity to learning parameters is made. Finally, experimental results and relevant analysis are presented. The experimental results validate the solutions of R-learning and G-learning are prone to diverge when ε is relatively small. And the results also show that the Tile Coding is effective in function approximation as a feature extraction method and it can be taken as a comparative standard for other methods.

Key words： Reinforcement Learning Markov Decision Process (MDP) R-Learning G-Learning Average Reward

收稿日期: 2007-12-11

ZTFLH:

TP181

基金资助:国家自然科学基金(No.60775046)、国家自然科学基金委创新研究群体科学基金(No.60721002)资助

作者简介: 王巍巍,男,1984年生,硕士研究生,主要研究方向为机器学习.E-mail:elegate@gmail.com.陈兴国,男,1984年生,硕士研究生,主要研究方向为机器学习.高阳,男,1972年生,副教授,主要研究方向为机器学习、Agent、图像处理.

引用本文:

王巍巍，陈兴国，高阳. 一种结合TileCoding的平均奖赏强化学习算法^*[J]. 模式识别与人工智能, 2008, 21(4): 446-452. WANG Wei-Wei, CHEN Xing-Guo, GAO Yang. An Average Reward Reinforcement Learning Algorithm with Tile Coding. , 2008, 21(4): 446-452.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/ 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2008/V21/I4/446