Abstract:A new method, named exploration density(ED) inspection, is presented. Useful options were discovered by the method through inspecting the influence of the state on agent’s explore ability in state space. The simulation results show that the proposed algorithm has better performance in reinforcement learning. The method has characteristics of taskindependence, no need of prior knowledge, etc. The created options can be directly shared among different tasks in the same environment.
[1] Barto A G, Mahadevan S. Recent Advances in Hierarchical Reinforcement Learning. Discrete Event Dynamic Systems: Theory and Applications, 2003, 13(4): 4177 [2] Sutton R, Precup D, Singh S. Between MDPs and SemiMDPs: A Framework for Temporal Abstraction in Reinforcement Learning. Artificial Intelligence, 1999, 112(1/2): 181211 [3] Parr R, Russell S. Reinforcement Learning with Hierarchies of Machines // Jordan M I, Kearns M J, Solla S A, eds. Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 1998, 10: 10431049 [4] Dietterich T G. Hierarchical Reinforcement Learning with the Maxq Value Function Decomposition. Journal of Artificial Intelligence Research, 2000, 13(5): 227303 [5] Maron O, LozanoPérez T. A Framework for MultipleInstance Learning // Jordan M I, Kearns M J, Solla S A, eds. Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 1998, 10: 570576 [6] McGovern E A. Autonomous Discovery of Temporal Abstractions from Interaction with an Environment. Ph.D Dissertation. Amherts, USA: University of Massachusetts. Department of Computer Science, 2002 [7] Hengst B. Discovering Hierarchy in Reinforcement Learning with HEXQ // Proc of the 19th International Conference on Machine Learning. Sydney, Australia, 2002: 243250 [8] Wang Bennian, Gao Yang, Chen Zhaoqian, et al. KCluster Subgoal Discovery Algorithm for Option. Journal of Computer Research and Development, 2006, 43(5): 851855 (in Chinese) (王本年,高 阳,陈兆乾,等.面向Option的K聚类Subgoal发现算法.计算机研究与发展, 2006, 43(5): 851855)