基于状态回溯代价分析的启发式Q学习

摘要
图/表
参考文献
相关文章 (15)

全文: PDF (495 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要由于强化学习算法动作策略学习比较费时，提出一种基于状态回溯的启发式强化学习方法.分析强化学习过程中重复状态，通过比较状态回溯过程中重复动作的选择策略，引入代价函数描述重复动作的重要性.结合动作奖赏及动作代价提出一种新的启发函数定义.该启发函数在强调动作重要性以加快学习速度的同时，基于代价函数计算动作选择的代价以减少不必要的探索，从而平稳地提高学习效率.对基于代价函数的动作选择策略进行证明.建立两种仿真场景，将算法用于机器人路径规划的仿真实验.实验结果表明基于状态回溯的启发式强化学习方法能平衡考虑获得的奖赏及付出的代价，有效提高Q学习的收敛速度.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	方敏
	李浩

关键词 ：代价分析, 启发函数, 状态回溯, Q学习

Abstract：Since action strategy learning is time-consuming for the reinforcement learning algorithm,a heuristic reinforcement learning algorithm is presented based on state backtracking. By analyzing the repetitive states and comparing the action policies of the reinforcement learning,a cost function is defined to indicate the importance of repetitive actions. A probability-based heuristic function is presented by combining an action reward with an action cost. The proposed algorithm reinforces the importance of an action to speed up learning by the heuristic function and measures the feasibility of an action to reduce unnecessary exploration by the cost function at the same time,thus the learning efficiency is steadily improve. This cost-based action strategy is proved to be reasonable. Two simulation scenarios are built and the experimental results of robot games prove that the proposed algorithm can learn by the tradeoff between rewards and costs,and effectively improve the convergence of Q-learning.

Key words： Cost Analysis Heuristic Function State Backtracking Q-Learning

收稿日期: 2012-08-13

ZTFLH:

TP181

基金资助:国家自然科学基金项目(No.61070143，61101248)、中央高校基本科研业务费项目(No.K5051203003)资助

作者简介: 方敏(通信作者)，女，1965年出生，教授，主要研究方向为智能信息处理、网络技术.E-mail:mfang@mail.xidian.edu.cn.李浩，男，1988年出生，硕士，主要研究方向为人工智能、机器学习.

引用本文:

方敏，李浩. 基于状态回溯代价分析的启发式Q学习[J]. 模式识别与人工智能, 2013, 26(9): 838-844. FANG Min,LI Hao. Heuristically Accelerated State Backtracking Q-Learning Based on Cost Analysis. , 2013, 26(9): 838-844.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/ 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2013/V26/I9/838

[1] Bianchi R A C,Ribeiro C H C ,Costa A H R. Heuristically Accelerated Q-Learning: A New Approach to Speed up Reinforcement Learning // Proc of the 17th Brazilian Symposinm on Artificial Intelligence. Maranhao,Brazil,2004: 245-254
[2] Barto A G,Mahadevan S. Recent Advances in Hierarchical Reinforcement Learning. Discrete Event Dynamic Systems: Theory and Applications,2003,13(1/2): 41-77
[3] Marthi B. Automatic Shaping and Decomposition of Reward Functions // Proc of the 24th International Conference on Machine Learning. Corvallis,USA,2007: 601-608
[4] Torrey L,Shavlik J,Walker T,et al. Skill Acquisition via Transfer Learning and Advice Taking // Proc of the 17th European Conference on Machine Learning. Berlin,Germany,2006: 425-436
[5] Bianchi R A C,Ribeiro C H C,Costa A H R. Accelerating Autonomous Learning by Using Heuristic Selection of Actions. Journal of Heuristics,2008,14(2): 135-168
[6] Liu Quan,Gao Yang,Chen Daoxu,et al. A Logical Reinforcement Learning Method Based on Heuristic Contour List. Journal of Computer Research and Development,2008,45(11): 1824-1830 (in Chinese)
(刘全,高阳,陈道蓄,等.一种基于启发式轮廓表的逻辑强化学习方法.计算机研究与发展,2008,45(11): 1824-1830)
[7] Liu Quan,Fu Qiming,Gong Shengrong,et al. Reinforcement
Learning Algorithm Based on Minimum State Method and Average Reward. Journal on Communications,2011,32(1): 66-71 (in Chinese)
(刘全,傅启明,龚声蓉,等.最小状态变元平均奖赏的强化学习方法.通信学报,2011,32(1): 66-71)
[8] Wei Yingzi,Zhao Mingyang. Design and Convergence Analysis of Heuristic Reward Function for Reinforcement Learning Algorithms. Computer Science,2005,32(3): 190-193 (in Chinese)
(魏英姿,赵明扬.强化学习算法中启发式回报函数的设计及其收敛性分析.计算机科学,2005,32(3): 190-193)
[9] Zhao Jin,Liu Weiyi,Jian Jinjian. State-Clusters Shared Cooperative Multiagent Reinforcement Learning // Proc of the 7th Asian Control Conference. Hong Kong,China,2009: 129-135
[10] Van Seijen H,Whiteson S,van Hasselt H. Exploiting Best-Match Equations for Efficient Reinforcement Learning. Machine Learning Research,2011,12(6): 2045-2094
[11] Fang Min,Li Hao,Zhang Xiaosong. A Heuristic Reinforcement Learning Based on State Backtracking Method // Proc of the IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology. Macau,China,2012: 673-678
[12] Gao Yang,Chen Shifu,Lu Xin. A Survey of Reinforcement Learning. Acta Automatiea Sinica,2004,30(1): 86-100 (in Chinese)
(高阳,陈世福,陆鑫.强化学习研究综述.自动化学报,2004,30(1): 86-100)
[13] Busoniu L,Babuska R,de Schutter B. A Comprehensive Survey of Multiagent Reinforcement Learning. IEEE Trans on Systems,Man,and Cybernetics,2008,38(2): 156-172
[14] Stone P,Sutton R S,Kuhlmann G. Reinforcement Learning for RoboCup-Soccer Keepaway. Adaptive Behavior,2005,13(3): 165-188
[15] Ota J. Multiagent Robot Systems as Distributed Autonomous Systems. Advanced Engineering Informatics,2006,20(1): 59-70