Q学习中基于模糊规则的强化函数设计方法

摘要
图/表
参考文献
相关文章 (5)

全文: PDF (498 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要 Q学习算法是求解信息不完全马尔可夫决策问题的一种强化学习方法.Q学习中强化信号的设计是影响学习效果的重要因素.本文提出一种基于模糊规则的Q学习强化信号的设计方法,提高强化学习的性能.并将该方法应用于单交叉口信号灯最优控制中,根据交通流的变化自适应调整交叉口信号灯的相位切换时间和相位次序.通过Paramics微观交通仿真软件验证,说明在解决交通控制问题中,使用基于模糊规则的Q学习的学习效果优于传统Q学习.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	赵晓华
	李振龙
	陈阳舟
	荣建

关键词 ： Q学习, 强化函数, 模糊规则, 交通信号控制, Paramics微观交通仿真软件

Abstract：Qlearning is a reinforcement learning method to solve Markovian decision problems with incomplete information. The design of reward function is an important factor that affects the learning results of Qlearning. A method to design the reward function of Qlearning based on fuzzy rules is introduced to improve the performance of reinforcement learning, and the method is applied to traffic signal optimal control. According to different traffic condition, the switching time and switching sequence of phase can be adapted. The performance of the system is evaluated by Paramics microcosmic traffic simulation software. And the results show that the learning effect of Qlearning based on fuzzy rules is better than that of conventional Qlearning for traffic signal control.

Key words： QLearning Reinforcement Function Fuzzy Rules Traffic Signal Control Paramics Microcosmic Traffic Simulation Software

收稿日期: 2006-06-07

ZTFLH:

TP391

作者简介: 赵晓华,女,1971年生,副教授,博士,主要研究方向为智能交通、控制理论及应用.E-mail:zhaoxiaohua@bjut.edu.cn.李振龙,男,1976年生,副教授,博士,主要研究方向为交通信息与控制.陈阳舟,男,1962年生,教授,博士生导师,主要研究方向为控制理论及应用.荣建,男,1973年生,教授,博士,主要研究方向为交通信息与控制.

引用本文:

赵晓华，李振龙，陈阳舟，荣建. Q学习中基于模糊规则的强化函数设计方法[J]. 模式识别与人工智能, 2008, 21(2): 254-259. ZHAO XiaoHua, LI ZhenLong, CHEN YangZhou, RONG Jian. A Method to Design Reinforcement Function Based on Fuzzy Rules in QLearning. , 2008, 21(2): 254-259.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/ 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2008/V21/I2/254

[1] Watkins C J C H, Dayan P. Technical Note: QLearning. Machine Learning, 1992, 8(3), 279292
[2] Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge, USA: MIT Press, 1998
[3] Wu Q H. Reinforcement Learning Control Using Interconnected Learning Automata. International Journal of Control, 1995, 62(1): 116
[4] Zhang Rubo, Gu Guochang, Liu Zhaode, et al. Reinforcement Learning Theory, Algorithms and Its Application. Control Theory and Applications, 2000, 17(5): 637642 (in Chinese)
(张汝波,顾国昌,刘照德,等.强化学习理论、算法及应用.控制理论与应用, 2000, 17(5): 637642)
[5] Fan Bo, Pan Quan, Zhang Hongcai. A Method to Design the Reward Function Based on Knowledge in MultiAgent Learning. Computer Engineering and Applications, 2005, 41(3): 7779 (in Chinese)
(范波,潘泉,张洪才.多智能体学习中基于知识的强化函数设计方法.计算机工程与应用, 2005, 41(3): 7779)
[6] Zhang Rubo, Zhou Ning, Gu Guochang, et al. Reinforcement Learning Based Obstacle Avoidance Learning for Intelligent Robot. Robot, 1999, 21(3): 204209 (in Chinese)
(张汝波,周宁,顾国昌,等.基于强化学习智能机器人避碰方法研究.机器人, 1999, 21(3): 204209)
[7] Yang Ming, Jia Li, Qiu Yuhui. Research on Automated Negotiation in MultiAgent System Based on Reinforcement Learning. Computer Engineering and Applications, 2004, 40(33): 98100,117 (in Chinese)
(杨明,嘉莉,邱玉辉.基于增强学习的多Agent自动协商研究.计算机工程与应用, 2004, 40(33): 98100,117)
[8] Ma Shoufeng, Li Ying, Liu Bao. AgentBased Learning Control Method for Urban Traffic Signal of Single Intersection. Journal of Systems Engineering, 2002, 17(6): 526530 (in Chinese)
(马寿峰,李英,刘豹.一种基于Agent的单路口交通信号学习控制方法.系统工程学报, 2002, 17(6): 526530)
[9] Jiang Guofei, Wu Cangpu. Learning to Control an Inverted Pendulum Using QLearning and Neural Networks. Acta Automatica Sinica, 1998, 24(5): 662666 (in Chinese)
(蒋国飞,吴沧浦.基于Q学习算法和BP神经元网络的倒立摆控制.自动化学报, 1998, 24(5): 662666)