|
|
Multiagent Cooperative Learning Based on Coordination of Boundary Samples |
HAN Wei |
School of Information Engineering, Nanjing University of Finance and Economics, Nanjing 210046 |
|
|
Abstract Aiming at the large statespace caused by the slow convergence of Q learning, a kind of multiagent cooperative learning is proposed by the coordination of boundary samples. Each agent is specialized in its subspace, and the agents coordinate through Boolean functions in boundary states. Simulation results have proved that the proposed method performs better than the traditional global learning.
|
Received: 05 December 2006
|
|
|
|
|
[1] Han Wei, Chen Youguang, Jiang Changhua. An InternalInference Based Multiagent Learning Method. Pattern Recognition and Artificial Intelligence, 2007, 20(2): 254260 (in Chinese) (韩 伟,陈优广,姜昌华. 基于内省推理的多agent在线学习方法.模式识别与人工智能, 2007, 20(2): 254260) [2] Luo Qing, Li Zhijun, Lü Tiansheng. MultiAgent Reinforcement Learning in Complex Environment. Journal of Shanghai Jiaotong University, 2002, 36(3): 302305 (in Chinese) (罗 清,李智军,吕恬生.复杂环境中的多智能体强化学习.上海交通大学学报, 2002, 36(3): 302305) [3] Du Chunxia, Gao Yun, Zhang Wen. QLearning with Prior Knowledge in MultiAgent Systems. Journal of Tsinghua University: Science and Technology, 2005, 45(7): 981984 (in Chinese) (杜春侠,高 云,张 文.多智能体系统中具有先验知识的Q学习算法.清华大学学报:自然科学版, 2005, 45(7): 981984) [4] Han Wei. MultiAgent Learning and Negotiation in Electronic MarketPlaces. Ph.D Dissertation. Shanghai, China: East China Normal University. College of Information Science and Technology, 2006: 7791 (in Chinese) (韩 伟.电子市场环境下的多智能体学习与协商.博士学位论文.上海:华东师范大学.信息科学技术学院, 2006: 7791) [5] Sun R, Peterson T. Multiagent Reinforcement Learning: Weighting and Partitioning. Neural Networks, 1999, 20(3): 727753 [6] Hougen D F, Gini M, Slagle J. Partitioning Input Space for Reinforcement Learning for Control // Proc of the IEEE International Conference on Neural Networks. Houston, USA, 1997: 755760 [7] Lee I S K, Lau H Y K. Adaptive State Space Partitioning for Reinforcement Learning. Engineering Applications of Artificial Intelligence, 2004, 17(3): 577588 [8] Tesauro G J. Temporal Difference Learning and TDGammon. Communications of the ACM, 1995, 38(3): 5868 [9] Baird L C. Residual Algorithms: Reinforcement Learning with Function Approximation // Proc of the 12th International Conference on Machine Learning. Tahoe City, USA, 1995: 3037 [10] Liu J. Autonomous Agents and Multiagent Systems. River Edge, USA: World Scientific Publishing, 2001 [11] Han Wei. Intelligent Pricing Algorithm Based on Multiagent Learning. Computer Engineering and Applications, 2007, 43(6): 1719 (in Chinese) (韩 伟.基于情节序列训练的电子市场智能定价算法.计算机工程与应用, 2007, 43(6): 1719) [12] Han Wei, Han Zhongyuan. Mutiagent Learning Based on BlackBoard Model. Computer Engineering, 2007, 33(22): 4244,47 (in Chinese) (韩 伟,韩忠愿.基于黑板模型的多智能体合作学习.计算机工程, 2007, 33(22): 4244,47) |
|
|
|