State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093 Department of Computer Science and Technology, Nanjing University, Nanjing 210093
Abstract:In reinforcement learning, the learning algorithms frequently have to deal with both continuous state and continuous action spaces to control accurately. In this paper, the great capacity of kernel method for handling continuous state space problems and the advantage of actor-critic method in dealing with continuous action space problems are combined. Kernel-based continuous-action actor-critic learning(KCACL) is proposed grounded on the combination. In KCACL, the actor updates each action probability based on reward-inaction, and the critic updates the state value function according to online selective kernel-based temporal difference(OSKTD) learning. The experimental results demonstrate the effectiveness of the proposed algorithm.