一种基于PSO的分层策略搜索算法<sup>*</sup>

摘要
图/表
参考文献
相关文章 (3)

全文: PDF (403 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要针对分层策略梯度强化学习算法(HPGRL)易陷入局部最优点等问题,提出一种分层策略搜索算法(PSOHPS).首先由设计者按照经典分层强化学习MAXQ方法的思想构建子任务分层结构,通过与环境的直接交互,PSOHPS利用具有较强全局搜索能力的粒子群对各复合子任务中的参数化策略进行进化,以获得优化的动作策略.最后以协商僵局消解的实验验证PSOHPS是有效的,其性能明显优于HPGRL.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	彭志平
	李绍平

关键词 ：分层强化学习, 粒子群优化算法(PSO), 分层策略, 协商僵局

Abstract：In order to overcome drawbacks in hierarchical policy gradient reinforcement learning algorithm (HPGRL), such as problem of local optimum, a new algorithm for searching hierarchical policies is proposed, named Hierarchical Policy Search Based on PSO (PSOHPS). The designers create the task decomposition graph according to the hierarchical theory of MAXQ, one of the classical hierarchical reinforcement learning techniques. Then the hierarchical parameterized policies of all compound subtasks are evolved in process of direct interaction with the environment by utilizing a particle swarm to acquire the optimized action policies. Experimental results demonstrate the algorithm is valid and its performance outperforms that of HPGRL remarkably.

Key words： Hierarchical Reinforcement Learning Particle Swarm Optimization (PSO) Hierarchical Policies Negotiation Deadlock

收稿日期: 2006-12-07

ZTFLH:

TP181

基金资助:广东省自然科学基金项目资助(No.06029281, 05011905)

作者简介: 彭志平,男,1969年生,博士,副教授,主要研究方向为机器学习、智能商务、多agent技术.E-mail:mmxypzhp@yahoo.com.cn.李绍平,女,1974年生,硕士,主要研究方向为人工智能应用技术.

引用本文:

彭志平，李绍平. 一种基于PSO的分层策略搜索算法^*[J]. 模式识别与人工智能, 2008, 21(1): 98-103. PENG ZhiPing, LI ShaoPing. An Algorithm for Hierarchical Policy Search Based on PSO. , 2008, 21(1): 98-103.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/ 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2008/V21/I1/98

[1] Gao Yang, Chen Shifu, Lu Xin. Research on Reinforcement Learning Technology: A Review. Acta Automatica Sinica, 2004, 30(1): 86100 (in Chinese)
(高阳,陈世福,陆鑫.强化学习研究综述.自动化学报, 2004, 30(1): 86100)
[2] Barto A G, Mahadevan S. Recent Advances in Hierarchical Reinforcement Learning. Discrete Event Dynamic Systems: Theory and Applications, 2003, 13(4): 4177
[3] Li Wei, Ye Qingtai, Zhu Changming. Application of Hierarchical Reinforcement Learning in Engineering Domain. Journal of Systems Science and Systems Engineering, 2005, 14(2): 207217
[4] Puterman M. Markov Decision Processes. New York, USA: Wiley, 1994
[5] Su Chang, Gao Yang, Chen Shifu, et al. The Study of Recognizing Options Based on SMDP. Pattern Recognition and Artificial Intelligence, 2005, 18(6): 679684 (in Chinese)
(苏畅,高阳,陈世福,等.基于SMDP环境的自主生成options算法的研究.模式识别与人工智能, 2005, 18(6): 679684)
[6] Watkins C T, Dayan P. QLearning. Machine Learning, 1992, 8(3): 279292
[7] Baxter J, Bartlett P L. InfiniteHorizon PolicyGradient Estimation. Journal of Artificial Intelligence Research, 2001, 15(4): 319350
[8] Ghavamzadeh M. Hierarchical Reinforcement Learning in Continuous State and MultiAgent Environments. Ph.D Dissertation. Amherst, USA: University of Massachusetts. Graduate School, 2005
[9] Ghavamzadeh M, Mahadevan S. Hierarchical Policy Gradient Algorithms // Proc of the 20th International Conference on Machine Learning. Washington, USA, 2003: 226233
[10] Dietterich T G. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition. Journal of Artificial Intelligence Research, 2000, 13(1): 227303
[11] Ghavamzadeh M, Mahadevan S, Makar R. Hierarchical Multiagent Reinforcement Learning. Journal of Autonomous Agents and MultiAgent Systems, 2006, 13(2): 197229
[12] Hu Xiaohui, Shi Yuhui, Eberhart R. Recent Advances in Particle Swarm // Proc of the IEEE Congress on Evolutionary Computation. Portland, USA, 2004, Ⅰ: 9097
[13] Peng Zhiping, Peng Hong, Zheng Qilun. Study on Bilateral and MultiIssue Autonomous Negotiation Model. Journal of Electronics & Information Technology, 2007, 29(3): 733738 (in Chinese)
(彭志平,彭宏,郑启伦.一种双边多议题自治协商模型的研究.电子与信息学报, 2007, 29(3): 733738)