基于最先策略增强学习的ART2神经网络<sup>*</sup>

摘要
图/表
参考文献
相关文章 (1)

全文: PDF (591 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要提出一种基于最先策略增强学习的ART2神经网络FPRLART2(ForemostPolicy Reinforcement Learning based ART2 neural network),并介绍其学习算法.为了达到在线学习的目的,在FPRLART2中,从状态到行为值之间的映射中,选择第一个得到奖励的行为,而不是选择诸如1step QLearning中具有最优行为值的行为.ART2神经网络用于存储分类模式,其权重通过增强学习增强或减弱,达到学习的目的.并将FPRLART2运用到移动机器人避碰撞问题的研究中.仿真实验表明,引入FPRLART2后减少移动机器人与障碍物发生碰撞的次数,具有良好的避碰效果.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	樊建
	吴耿锋

关键词 ：增强学习, ART2神经网络, 最先策略, 避碰撞

Abstract：A foremostpolicy reinforcement learning based ART2 neural network (FPRLART2) and its learning algorithm are proposed in this paper. To fit the requirement of real time learning, the first awarded behavior based on present states is selected in our ForemostPolicy Reinforcement Learning (FPRL) in stead of the optimal behavior in 1step QLearning. The algorithm of FPRL is given and it is integrated with ART2 neural network. The stored weights of classified pattern in ART2 is increased or decreased by reinforcement learning. The FPRLART2 is successfully used in collision avoidance of mobile robot and the simulation experiment indicates that the times of collision between robot and obstacle is effectively decreased. The FPRLART2 makes favorable result of collision avoidance.

Key words： Reinforcement Learning ART2 Neural Network ForemostPolicy Collision Avoidance

收稿日期: 2004-12-19

ZTFLH:

TP18

基金资助:上海市科学技术发展基金项目(No.015115042)、上海市教委第4期重点学科建设项目(No.B682)资助

作者简介: 樊建,男,1978年生,博士研究生,主要研究方向为智能信息处理、机器人控制.E-mail: jfan@mail.shu.edu.cn.吴耿锋,男,1945年生,教授,博士生导师,主要研究方向为智能控制、神经元网络、模糊逻辑和专家系统.

引用本文:

樊建，吴耿锋. 基于最先策略增强学习的ART2神经网络^*[J]. 模式识别与人工智能, 2006, 19(3): 428-432. FAN Jian, WU GengFeng. ForemostPolicy Reinforcement Learning Based ART2 Neural Network. , 2006, 19(3): 428-432.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/ 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2006/V19/I3/428

[1] Carpenter G A, Grossberg S. ART2: Stable Self-Organization of Category Recognition Codes for Analog Input Patterns. Applied Optics, 1987, 26(23): 4919-4930
[2] Liu X H, Yu Z Z, Duan J, et al. Face Recognition Using Adaptive Resonance Theory. In: Proc of the International Conference on Machine Learning and Cybernetics. Xi’an, China, 2003, Ⅴ: 3167-3171
[3] Fan J, Wu G F, et al. Reinforcement Learning and ART2 Neural Network Based Collision Avoidance System of Mobile Robot. In: Yin F L, Wang J, Guo C G, eds. Lecture Notes in Computer Science. 2004, 3174: 35-40
[4] Li M, Yan C H, Liu G H. ART2 Neural Networks with More Vigorous Vigilance Test Criterion. Journal of Image and Graphics, 2001, 6(1): 81-85 (in Chinese)
(黎明,严超华,刘高航.具有更严格警戒测试准则的ART2神经网络.中国图象图形学报, 2001, 6(1): 81-85)
[5] Whitehead S D, Sutton R S, Ballard D H. Advances in Reinforcement Learning and Their Implications for Intelligent Control. In: Proc of the 5th IEEE International Symposium on Intelligent Control. Philadelphia, USA ,1990, Ⅱ: 1289-1297
[6] Suwimonteerabuth D, Chongstitvatana P. Online Robot Learning by Reward and Punishment for a Mobile Robot. In: Proc of the IEEE/RSJ International Conference on Intelligent Robots and Systems. Lausanne, Switzerland, 2002, Ⅰ: 921-926
[7] Fujimori A, Tani S. A Navigation of Mobile Robot with Collision Avoidance for Moving Obstacles. In: Proc of the IEEE International Conference on Industrial Technology. Bangkok, Thailand, 2002, Ⅰ: 1-6
[8] Grossberg S. Adaptive Pattern Classification and Universal Recoding, I: Parallel Development and Coding of Neural Feature Detectors. Biological Cybernetics, 1976, 23(3): 121-134
[9] Grossberg S. Adaptive Pattern Classification and Universal Recoding, II: Feedback, Expectation, Olfaction, Illusions. Biological Cybernetics, 1976, 23(4): 187-202
[10] Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge, USA: MIT Press, 1998
[11] Xiao N F, Nahavandi S. A Reinforcement Learning Approach for Robot Control in an Unknown Environment. In: Proc of the IEEE International Conference on Industrial Technology. Bangkok, Thailand, 2002, Ⅱ: 1096-1099