动态模糊Q学习算法及嵌入式平台的实时实现<sup>*</sup>

摘要
图/表
参考文献
相关文章 (2)

全文: PDF (622 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要介绍一种新的在线自适应的动态模糊Q强化学习算法.系统根据从环境中得到的反馈评估已进行的决策,给予奖励和惩罚,更新系统的Q值,在线自动调整模糊控制的结构与参数.根据系统当前的环境状态以及模糊控制强化学习的Q值来决定当前规则的动作输出,并由模糊推理产生连续输出的动作.扩展贪心搜索策略,确保控制规则的各个输出动作在学习初期都被搜索过,避免陷入局部最优解.将有效跟踪算法和后设学习规则相结合,有效提高系统学习速率.在嵌入式平台中实时控制的实现以及和相关研究结论的对比验证该算法的优越性.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	卢永奎
	许旻
	李永新
	杜华生
	吴月华
	杨杰

关键词 ：模糊控制, 在线自组织, Q强化学习, 嵌入式系统, 实时控制

Abstract：A new dynamic fuzzy Qlearning (DFQL) method is presented in this paper which is capable of tuning fuzzy inference systems (FIS) online. In DFQL system, the generation of continuous actions depends upon a discrete number of actions of every fuzzy rule and the vector of firing strengths of fuzzy rule. In order to explore the set of possible actions and acquire experiences through the reinforcement signals, the actions are selected using an explorationexploitation strategy based on the expended greedy algorithm. A function Q that gives the action quality with eligibility trace and meta learning rule is used here to speed up learning. εcompleteness of fuzzy rules criterion and temporaldifference (TD) error criterion are considered for rule generation. The DFQL approach has been applied to a realtime control caterpillar robot for the wall following task. Experimental results and comparative studies with the fuzzy Qlearning and continuousaction Qlearning in the wallfollowing task of mobile robots demonstrate that the proposed DFQL method is superior.

Key words： Fuzzy Control OnLine SelfOrganizing QLearning Embedded System RealTime Control

收稿日期: 2005-01-27

ZTFLH:

TP181

基金资助:国家863计划资助项目(No.2001AA422410)

作者简介: 卢永奎,男,1975年生,博士研究生,主要研究方向为机器人学.E-mail: luyongkui@hotmail.com.许旻,男,1972年生,博士研究生,主要研究方向为机器人学.李永新,男,1962年生,副教授,主要研究方向为光电计量技术.杜华生,男,1942年生,教授,主要研究方向为现代设计方法.吴月华,女,1945年生,副教授,主要研究方向为功能材料.杨杰,男,1946年生,教授,主要研究方向为机器人学.

引用本文:

卢永奎，许旻，李永新，杜华生，吴月华，杨杰. 动态模糊Q学习算法及嵌入式平台的实时实现^*[J]. 模式识别与人工智能, 2006, 19(4): 439-444. LU YongKui , XU Min, LI YongXin, DU HuaSheng, WU YueHua, YANG Jie. Dynamic Fuzzy QLearning and Its RealTime Application in Embedded System. , 2006, 19(4): 439-444.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/ 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2006/V19/I4/439

[1] Jang J S R, Sun C T, Mizutani E. Neuro-Fuzzy and Soft Computing: A Computational Approach to Learning and Mathine Intellignece. Englewood Cliffs, USA: Prentice-Hall, 1997
[2] Wu S, Er M J, Gao Y. A Fast Approach for Automatic Generation of Fuzzy Rules by Generalized Dynamic Fuzzy Neural Networks. IEEE Trans on Fuzzy Systems, 2001, 9(4): 578-594
[3] Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge, USA: MIT Press, 1998
[4] Watkins C J C H. Learning with Delayed Rewards. Ph.D Dissertation. Department of Psychology, University of Cambridge, Cambridge, UK, 1989
[5] Sutton R S. Learning to Predict by the Methods of Temporal Differences. Machine Learning, 1988, 3(1): 9-44
[6] Sutton R S. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding. In: Touretzky D S, Mozer M C, Hasselmo M E, eds. Advanced in Neural Information Processing Systems, 1996, 8: 1038-1044
[7] Lee C C. Fuzzy Logic in Control Systems: Fuzzy Logic Controller-Part I. IEEE Trans on Systems, Man and Cybernetics, 1990, 20(2): 404-418
[8] Lee C C. Fuzzy Logic in Control Systems: Fuzzy Logic Controller-Part II. IEEE Trans on Systems, Man and Cybernetics, 1990, 20(2): 419-435
[9] Thrun S B. Efficient Exploration in Reinforcement Learning. Technical Report, CMU-CS-92-102, School of Computer Science, Carnegie Mellon University, Pittsburgh, USA, 1992
[10] Saridis G N. Learning Applied to Successive Approximation Algorithms. IEEE Trans on Systems, Science and Cybernetics, 1970, 6: 97-103
[11] Jacobs R A. Increased Rates of Convergence through Learning Rate Adaptation. Neural Networks, 1988, 1(3): 295-307
[12] Jouffe L. Fuzzy Inference System Learning by Reinforcement Methods. IEEE Trans on Systems, Man and Cybernetics, 1998, 28(3): 338-355
[13] Millan J R, Posenato D, Dedieu E. Continuous-Action Q-learning. Machine Learning, 2002, 49(2-3): 247-265