Abstract:Average reward reinforcement learning is an important undiscounted optimality framework. However, most of the work was based on discrete state space tasks. In this paper, how to combine function approximation with average reward learning is studied, and the parameter update condition is modified according to the continuous space. Besides, a close study on the performance of G-learning and its insensitivity to learning parameters is made. Finally, experimental results and relevant analysis are presented. The experimental results validate the solutions of R-learning and G-learning are prone to diverge when ε is relatively small. And the results also show that the Tile Coding is effective in function approximation as a feature extraction method and it can be taken as a comparative standard for other methods.
[1] Mahadevan S. Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results. Machine Learning, 1996, 22(1/2/3): 159-195 [2] Schwartz A. A Reinforcement Learning Method for Maximizing Undiscounted Rewards // Proc of the 10th International Conference on Machine Learning. Amherst, USA, 1993: 298-305 [3] Singh S P. Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes // Proc of the 12th National Conference on Artificial Intelligence. Washington, USA, 1994: 700-705 [4] Mahadevan S. An Average Reward Reinforcement Learning Algorithm for Computing Bias-Optimal Policies // Proc of the 13th National Conference on Artificial Intelligence. Oregon, Portland, 1996: 875-880 [5] Tadepalli P, Ok D. H-Learning: A Reinforcement Learning Method to Optimize Undiscounted Average Reward. Technical Report, 94-30-01, Corvallis, USA: Oregon State University. Computer Science Department, 1994 [6] Das T K, Gosavi A, Mahadevan S, et al. Solving Semi-Markov Decision Problems Using Average Reward Reinforcement Learning. Management Science, 1999, 45(4): 560-574 [7] Gosavi A. Reinforcement Learning for Long-Run Average Cost. European Journal of Operational Research, 2004, 155(3): 654-674 [8] Gosavi A. A Reinforcement Learning Algorithm Based on Policy Iteration for Average Reward: Empirical Results with Yield Management and Convergence Analysis. Machine Learning, 2004, 55(1): 5-29 [9] Gao Yang, Zhou Ruyi, Wang Hao, et al. Study on an Average Reward Reinforcement Learning Algorithm. Chinese Journal of Computers, 2007, 30(8): 1372-1378 (in Chinese) (高 阳,周如益,王 皓,等.平均奖赏强化学习算法研究.计算机学报, 2007, 30(8): 1372-1378) [10] Tadepalli P, Givan R, Driessens K. Relational Reinforcement Learning: An Overview // Proc of the ICML Workshop on Relational Reinforcement Learning. Barff, Canada, 2004: 1-9 [11] Richardson M, Domingos P. Markov Logic Network. Machine Learning, 2006, 62(1/2): 107-136 [12] Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge, USA: MIT Press, 1998