Abstract:A new dynamic fuzzy Qlearning (DFQL) method is presented in this paper which is capable of tuning fuzzy inference systems (FIS) online. In DFQL system, the generation of continuous actions depends upon a discrete number of actions of every fuzzy rule and the vector of firing strengths of fuzzy rule. In order to explore the set of possible actions and acquire experiences through the reinforcement signals, the actions are selected using an explorationexploitation strategy based on the expended greedy algorithm. A function Q that gives the action quality with eligibility trace and meta learning rule is used here to speed up learning. εcompleteness of fuzzy rules criterion and temporaldifference (TD) error criterion are considered for rule generation. The DFQL approach has been applied to a realtime control caterpillar robot for the wall following task. Experimental results and comparative studies with the fuzzy Qlearning and continuousaction Qlearning in the wallfollowing task of mobile robots demonstrate that the proposed DFQL method is superior.
卢永奎,许旻,李永新,杜华生,吴月华,杨杰. 动态模糊Q学习算法及嵌入式平台的实时实现*[J]. 模式识别与人工智能, 2006, 19(4): 439-444.
LU YongKui , XU Min, LI YongXin, DU HuaSheng, WU YueHua, YANG Jie. Dynamic Fuzzy QLearning and Its RealTime Application in Embedded System. , 2006, 19(4): 439-444.
[1] Jang J S R, Sun C T, Mizutani E. Neuro-Fuzzy and Soft Computing: A Computational Approach to Learning and Mathine Intellignece. Englewood Cliffs, USA: Prentice-Hall, 1997 [2] Wu S, Er M J, Gao Y. A Fast Approach for Automatic Generation of Fuzzy Rules by Generalized Dynamic Fuzzy Neural Networks. IEEE Trans on Fuzzy Systems, 2001, 9(4): 578-594 [3] Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge, USA: MIT Press, 1998 [4] Watkins C J C H. Learning with Delayed Rewards. Ph.D Dissertation. Department of Psychology, University of Cambridge, Cambridge, UK, 1989 [5] Sutton R S. Learning to Predict by the Methods of Temporal Differences. Machine Learning, 1988, 3(1): 9-44 [6] Sutton R S. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding. In: Touretzky D S, Mozer M C, Hasselmo M E, eds. Advanced in Neural Information Processing Systems, 1996, 8: 1038-1044 [7] Lee C C. Fuzzy Logic in Control Systems: Fuzzy Logic Controller-Part I. IEEE Trans on Systems, Man and Cybernetics, 1990, 20(2): 404-418 [8] Lee C C. Fuzzy Logic in Control Systems: Fuzzy Logic Controller-Part II. IEEE Trans on Systems, Man and Cybernetics, 1990, 20(2): 419-435 [9] Thrun S B. Efficient Exploration in Reinforcement Learning. Technical Report, CMU-CS-92-102, School of Computer Science, Carnegie Mellon University, Pittsburgh, USA, 1992 [10] Saridis G N. Learning Applied to Successive Approximation Algorithms. IEEE Trans on Systems, Science and Cybernetics, 1970, 6: 97-103 [11] Jacobs R A. Increased Rates of Convergence through Learning Rate Adaptation. Neural Networks, 1988, 1(3): 295-307 [12] Jouffe L. Fuzzy Inference System Learning by Reinforcement Methods. IEEE Trans on Systems, Man and Cybernetics, 1998, 28(3): 338-355 [13] Millan J R, Posenato D, Dedieu E. Continuous-Action Q-learning. Machine Learning, 2002, 49(2-3): 247-265