Abstract:In order to overcome drawbacks in hierarchical policy gradient reinforcement learning algorithm (HPGRL), such as problem of local optimum, a new algorithm for searching hierarchical policies is proposed, named Hierarchical Policy Search Based on PSO (PSOHPS). The designers create the task decomposition graph according to the hierarchical theory of MAXQ, one of the classical hierarchical reinforcement learning techniques. Then the hierarchical parameterized policies of all compound subtasks are evolved in process of direct interaction with the environment by utilizing a particle swarm to acquire the optimized action policies. Experimental results demonstrate the algorithm is valid and its performance outperforms that of HPGRL remarkably.
彭志平,李绍平. 一种基于PSO的分层策略搜索算法*[J]. 模式识别与人工智能, 2008, 21(1): 98-103.
PENG ZhiPing, LI ShaoPing. An Algorithm for Hierarchical Policy Search Based on PSO. , 2008, 21(1): 98-103.
[1] Gao Yang, Chen Shifu, Lu Xin. Research on Reinforcement Learning Technology: A Review. Acta Automatica Sinica, 2004, 30(1): 86100 (in Chinese) (高 阳,陈世福,陆 鑫.强化学习研究综述.自动化学报, 2004, 30(1): 86100) [2] Barto A G, Mahadevan S. Recent Advances in Hierarchical Reinforcement Learning. Discrete Event Dynamic Systems: Theory and Applications, 2003, 13(4): 4177 [3] Li Wei, Ye Qingtai, Zhu Changming. Application of Hierarchical Reinforcement Learning in Engineering Domain. Journal of Systems Science and Systems Engineering, 2005, 14(2): 207217 [4] Puterman M. Markov Decision Processes. New York, USA: Wiley, 1994 [5] Su Chang, Gao Yang, Chen Shifu, et al. The Study of Recognizing Options Based on SMDP. Pattern Recognition and Artificial Intelligence, 2005, 18(6): 679684 (in Chinese) (苏 畅,高 阳,陈世福,等.基于SMDP环境的自主生成options算法的研究.模式识别与人工智能, 2005, 18(6): 679684) [6] Watkins C T, Dayan P. QLearning. Machine Learning, 1992, 8(3): 279292 [7] Baxter J, Bartlett P L. InfiniteHorizon PolicyGradient Estimation. Journal of Artificial Intelligence Research, 2001, 15(4): 319350 [8] Ghavamzadeh M. Hierarchical Reinforcement Learning in Continuous State and MultiAgent Environments. Ph.D Dissertation. Amherst, USA: University of Massachusetts. Graduate School, 2005 [9] Ghavamzadeh M, Mahadevan S. Hierarchical Policy Gradient Algorithms // Proc of the 20th International Conference on Machine Learning. Washington, USA, 2003: 226233 [10] Dietterich T G. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition. Journal of Artificial Intelligence Research, 2000, 13(1): 227303 [11] Ghavamzadeh M, Mahadevan S, Makar R. Hierarchical Multiagent Reinforcement Learning. Journal of Autonomous Agents and MultiAgent Systems, 2006, 13(2): 197229 [12] Hu Xiaohui, Shi Yuhui, Eberhart R. Recent Advances in Particle Swarm // Proc of the IEEE Congress on Evolutionary Computation. Portland, USA, 2004, Ⅰ: 9097 [13] Peng Zhiping, Peng Hong, Zheng Qilun. Study on Bilateral and MultiIssue Autonomous Negotiation Model. Journal of Electronics & Information Technology, 2007, 29(3): 733738 (in Chinese) (彭志平,彭 宏,郑启伦.一种双边多议题自治协商模型的研究.电子与信息学报, 2007, 29(3): 733738)