Abstract:Current convergence analyses of reinforcement learning method are mainly applied to discrete state problems. Analyses of continuous state reinforcement learning method are limited to simple LQR control problems. After analyzing two convergent reinforcement learning methods for LQR control problem, a new method only requiring partial model information is proposed to make up for the defects of these two methods. In this method, a recursive leastsquares TD method is used to estimate parameters of value function and a recursive leastsquares method is used to estimate the greedily improved policy. In theoretical analysis, a convergence proof is presented for the proposed policy iteration method in ideal case. Simulation result shows that this method converges an optimal control policy.
[1] Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge, USA: MIT Press, 1998 [2] Landelius T. Reinforcement Learning and Distributed Local Model Synthesis. PhD Dissertation. Department of Electrical Engineering, Linkoping University, Linkoping, Sweden, 1997 [3] Xu X, He H G, Hu D W. Efficient Reinforcement Learning Using Recursive Least-Squares Methods. Journal of Artificial Intelligence Research, 2002, 16: 259-292 [4] Wen F, Chen Z H, Wang A Q. An Improvement to Fast-AHC Algorithm. Information and Control, 2004, 32 (7): 652-656 [5] Werbos P J. Stable Adaptive Control Using New Critic Designs. 1998. http://arxiv.org/html/adap-org/ 9810001 [6] Tsitsikilis J N, Roy B V. An Analysis of Temporal -Difference Learning with Function Approximation. IEEE Trans on Automatic Control, 1997, 42(5): 674-690 [7] Bradtke S J. Incremental Dynamic Programming for On-Line Adaptive Optimal Control. PhD Dissertation. Department of Computer Science, University of Massachusetts, Amherst, USA, 1994 [8] Boyan J. Least-Squares Temporal Difference Learning. In: Bratko I, Dzeroski S, eds. Proc of the 16th International Conference on Machine Learning. San Francisco, USA: Morgan Kaufmann, 1999, 49-56 [9] Goodwin G C, Sin K S. Adaptive Filtering Prediction and Control. Englewood Cliffs, USA: Prentice-Hall, 1984