基于核方法的连续动作Actor-Critic学习<sup>*</sup>

摘要
图/表
参考文献
相关文章 (15)

全文: PDF (488 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要强化学习算法通常要处理连续状态及连续动作空间问题以实现精确控制。就此文中结合Actor-Critic方法在处理连续动作空间的优点及核方法在处理连续状态空间的优势，提出一种基于核方法的连续动作Actor-Critic学习算法(KCACL)。该算法中，Actor根据奖赏不作为原则更新动作概率，Critic采用基于核方法的在线选择时间差分算法学习状态值函数。对比实验验证该算法的有效性。

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章

Abstract：In reinforcement learning, the learning algorithms frequently have to deal with both continuous state and continuous action spaces to control accurately. In this paper, the great capacity of kernel method for handling continuous state space problems and the advantage of actor-critic method in dealing with continuous action space problems are combined. Kernel-based continuous-action actor-critic learning(KCACL) is proposed grounded on the combination. In KCACL, the actor updates each action probability based on reward-inaction, and the critic updates the state value function according to online selective kernel-based temporal difference(OSKTD) learning. The experimental results demonstrate the effectiveness of the proposed algorithm.

收稿日期: 2013-05-13

ZTFLH:

TP 181

基金资助:国家自然科学基金项目(No.61035003,61175042,60721002)、国家973计划项目(No.2009CB320702)、江苏省自然科学基金项目(No.BK2011005)资助

作者简介: 陈兴国，男，1984 年生，博士研究生，主要研究方向为强化学习.E-mail:chenxgspring@gmail.com.高阳(通讯作者)，男，1972 年生，教授，博士生导师，主要研究方向为数据挖掘、机器学习.E-mail: gaoy@nju.edu.cn.范顺国，男，1989 年生，硕士研究生，主要研究方向为强化学习、迁移学习.俞亚君，男，1990 年生，硕士研究生，主要研究方向为强化学习.

引用本文:

陈兴国，高阳，范顺国，俞亚君. 基于核方法的连续动作Actor-Critic学习^*[J]. 模式识别与人工智能, 2014, 27(2): 103-110. CHEN Xing-Guo, GAO Yang, FAN Shun-Guo, YU Ya-Jun. Kernel-Based Continuous-Action Actor-Critic Learning. , 2014, 27(2): 103-110.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/ 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2014/V27/I2/103