基于探索密度的Option子目标发现算法

Abstract
Figure/Table
References
Related Citation (15)

Download: PDF (666 KB) HTML (1 KB)
Export: BibTeX | EndNote (RIS)

Abstract A new method, named exploration density(ED) inspection, is presented. Useful options were discovered by the method through inspecting the influence of the state on agent’s explore ability in state space. The simulation results show that the proposed algorithm has better performance in reinforcement learning. The method has characteristics of taskindependence, no need of prior knowledge, etc. The created options can be directly shared among different tasks in the same environment.

Key words： Hierarchical Reinforcement Learning Option Exploration Density (ED)

Received: 14 March 2006

ZTFLH:

TP181

	Service

	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	MENG JiangHua
	ZHU JiHong
	SUN ZengQi

Cite this article:

MENG JiangHua,ZHU JiHong,SUN ZengQi. Discovery Algorithm for Option Based on Exploration Density[J]. , 2007, 20(2): 236-240.

URL:

http://manu46.magtech.com.cn/Jweb_prai/EN/ OR http://manu46.magtech.com.cn/Jweb_prai/EN/Y2007/V20/I2/236

[1] Barto A G, Mahadevan S. Recent Advances in Hierarchical Reinforcement Learning. Discrete Event Dynamic Systems: Theory and Applications, 2003, 13(4): 4177
[2] Sutton R, Precup D, Singh S. Between MDPs and SemiMDPs: A Framework for Temporal Abstraction in Reinforcement Learning. Artificial Intelligence, 1999, 112(1/2): 181211
[3] Parr R, Russell S. Reinforcement Learning with Hierarchies of Machines // Jordan M I, Kearns M J, Solla S A, eds. Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 1998, 10: 10431049
[4] Dietterich T G. Hierarchical Reinforcement Learning with the Maxq Value Function Decomposition. Journal of Artificial Intelligence Research, 2000, 13(5): 227303
[5] Maron O, LozanoPérez T. A Framework for MultipleInstance Learning // Jordan M I, Kearns M J, Solla S A, eds. Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 1998, 10: 570576
[6] McGovern E A. Autonomous Discovery of Temporal Abstractions from Interaction with an Environment. Ph.D Dissertation. Amherts, USA: University of Massachusetts. Department of Computer Science, 2002
[7] Hengst B. Discovering Hierarchy in Reinforcement Learning with HEXQ // Proc of the 19th International Conference on Machine Learning. Sydney, Australia, 2002: 243250
[8] Wang Bennian, Gao Yang, Chen Zhaoqian, et al. KCluster Subgoal Discovery Algorithm for Option. Journal of Computer Research and Development, 2006, 43(5): 851855 (in Chinese)
(王本年,高阳,陈兆乾,等.面向Option的K聚类Subgoal发现算法.计算机研究与发展, 2006, 43(5): 851855)

[1]	JIANG Xiaojuan, ZHANG Wensheng. Online Structure Learning Algorithm for Weighted Networks[J]. , 2016, 29(2): 122-130.