基于角色的自适应参数共享方法

doi:10.16451/j.cnki.issn1003-6059.202503001

摘要
图/表
参考文献
相关文章 (15)

全文: PDF (2103 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要在大规模异构多智能体强化学习中,参数共享常用于减少训练参数并加速训练过程,但传统完全参数共享方法容易导致智能体行为过度一致,而独立参数训练方法却受到计算复杂度和内存限制.因此,文中提出基于角色的自适应参数共享方法(Role-Based Adaptive Parameter Sharing Method, RAPS).首先,根据智能体的任务特性进行角色分组.然后,在同一网络结构下,结合非结构化网络剪枝技术,为不同角色的智能体生成稀疏化的子网络结构,并引入动态调整机制,根据任务需求自适应优化共享参数与独立参数的比例.此外,通过角色间的协作损失函数,进一步增强异构智能体间的协调能力,在有效降低计算复杂度的同时,保持异构智能体的行为差异性.实验表明,在不同多智能体任务上,RAPS都能提升多智能体系统的性能和可扩展性.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	方宝富
	王琼
	王浩
	王在俊

关键词 ：大规模异构多智能体强化学习, 参数共享, 非结构化网络剪枝, 角色分组

Abstract：In large-scale heterogeneous multi-agent reinforcement learning, parameter sharing is often utilized to reduce the number of training parameters and accelerate the training process. However, the traditional full parameter sharing approach is prone to causing excessive behavioral uniformity among agents, while independent parameter training methods are constrained by computational complexity and memory limitations. Therefore, a role-based adaptive parameter sharing(RAPS) method is proposed in this paper. First, agents are grouped into roles based on their task characteristics. Then, within a unified network structure, sparse sub-network structures are generated for different agent roles by integrating unstructured network pruning techniques. A dynamic adjustment mechanism is introduced to adaptively optimize the ratio of shared and independent parameters according to task requirements. Additionally, a collaborative loss function between roles is incorporated to further enhance coordination among heterogeneous agents. Thus, computational complexity is effectively reduced by RAPS while behavioral diversity among heterogeneous agents is preserved. Experimental results demonstrate that RAPS improves the performance and scalability of multi-agent systems significantly in different multi-agent tasks.

Key words： Large-Scale Heterogeneous Multi-agent Reinforcement Learning Parameter Sharing Unstructured Network Pruning Role Grouping

收稿日期: 2025-01-16

ZTFLH:

TP391

基金资助:安徽省自然科学基金项目(No.2308085MF203)、安徽高校协同创新项目(No.GXXT-2022-055)、民航飞行技术与飞行安全重点实验室开放基金项目(No.FZ2022KF09)、民航飞行技术与飞行安全重点实验室重点项目(No.FZ2022ZZ02)资助

通讯作者: 方宝富,博士,副教授,主要研究方向为智能机器人系统.E-mail:fangbf@hfut.edu.cn.

作者简介: 王琼,硕士研究生,主要研究方向为多智能体深度强化学习.E-mail:2324289404@qq.com.
王浩,博士,教授,主要研究方向为分布式智能系统、机器人. E-mail:jsjxwangh@hfut.edu.cn.
王在俊,硕士,研究员,主要研究方向为多机器人任务分配、人工智能.E-mail:tiantian20030315@126.com.

引用本文:

方宝富, 王琼, 王浩, 王在俊. 基于角色的自适应参数共享方法[J]. 模式识别与人工智能, 2025, 38(3): 193-204. FANG Baofu, WANG Qiong, WANG Hao, WANG Zaijun. Role-Based Adaptive Parameter Sharing Method. Pattern Recognition and Artificial Intelligence, 2025, 38(3): 193-204.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.202503001 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2025/V38/I3/193

[1] 陈卓然,刘泽阳,万里鹏,等.多智能体强化学习理论及其应用综述.模式识别与人工智能, 2024, 37(10): 851-872.
(CHEN Z R, LIU Z Y, WAN L P, et al. A Review of Multi-agent Reinforcement Learning Theory and Applications. Pattern Recognition and Artificial Intelligence, 2024, 37(10): 851-872.)
[2] YANG W L, WEI Y C, WEI H Y, et al. Survey on Explainable AI: From Approaches, Limitations and Applications Aspects. Human-Centric Intelligent Systems, 2023, 3(3): 161-188.
[3] CANESE L, CARDARILLI G C, DI NUNZIO L, et al. Multi-agent Reinforcement Learning: A Review of Challenges and Applications. Applied Sciences, 2021, 11(11). DOI: 10.3390/app11114948.
[4] WONG A, BÄCK T, KONONOVA A V, et al. Deep Multiagent Rein-forcement Learning: Challenges and Directions. Artificial Intelligence Review, 2023, 56(6): 5023-5056.
[5] BOUKTIF S, CHENIKI A, OUNI A, et al. Deep Reinforcement Lear-ning for Traffic Signal Control with Consistent State and Reward Design Approach. Knowledge-Based Systems, 2023, 267. DOI: 10.1016/j.knosys.2023.110440.
[6] SHI Y B, HU B, HUANG R.Task Allocation and Path Planning of Many Robots with Motion Uncertainty in a Warehouse Environment // Proc of the IEEE International Conference on Real-Time Computing and Robotics. Washington, USA: IEEE, 2021: 776-781.
[7] 郭子恒,蔡晨晓.基于改进深度强化学习的无人机自主导航方法.信息与控制, 2023, 52(6): 736-746, 772.
(GUO Z H, CAI C X.Autonomous Navigation Algorithm of UAV Based on Improved Deep-Reinforcement-Learning. Information and Control, 2023, 52(6): 736-746, 772.)
[8] ZENG Y H, TAN X C, SHA M Q, et al. The Study of DDPG Based Spatiotemporal Dynamic Deployment Optimization of Air-Ground AD HOC Network for Disaster Emergency Response. International Journal of Applied Earth Observation and Geoinformation, 2024, 128. DOI: 10.1016/j.jag.2024.103708.
[9] JAYANETTI A, HALGAMUGE S, BUYYA R.Multi-agent Deep Reinforcement Learning Framework for Renewable Energy-Aware Workflow Scheduling on Distributed Cloud Data Centers. IEEE Transactions on Parallel and Distributed Systems, 2024, 35(4): 604-615.
[10] 徐佳,胡春鹤.分布式多经验池的无人机自主避碰方法.信息与控制, 2023, 52(4): 432-443.
(XU J, HU C H.Autonomous Collision Avoidance Method of UAV Based on Distributed Multi-experience Pool. Information and Control, 2023, 52(4): 432-443.)
[11] 方宝富,余婷婷,王浩,等.稀疏奖励场景下基于状态空间探索的多智能体强化学习算法.模式识别与人工智能, 2024, 37(5): 435-446.
(FANG B F, YU T T, WANG H, et al. Multi-agent Reinforcement Learning Algorithm Based on State Space Exploration in Sparse Reward Scenarios. Pattern Recognition and Artificial Intelligence, 2024, 37(5): 435-446.)
[12] YU C, VELU A, VINITSKY E, et al. The Surprising Effectiveness of PPO in Cooperative Multi-agent Games // Proc of the 36th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2022: 24611-24624.
[13] KUBA J G, CHEN R Q, WEN M N, et al. Trust Region Policy Optimisation in Multi-agent Reinforcement Learning[C/OL].[2024-12-16]. https://arxiv.org/pdf/2109.11251.
[14] 方宝富,余婷婷,王浩,等.稀疏奖励场景下基于适应性状态近似的多智能体强化学习.机器人, 2024, 46(6): 663-671, 682.
(FANG B F, YU T T, WANG H, et al. Multi-agent Reinforcement Learning Based on Adaptive State Approximation in Sparse Reward Scenarios. Robot, 2024, 46(6): 663-671, 682.)
[15] GUPTA J K, EGOROV M, KOCHENDERFER M.Cooperative Multi-agent Control Using Deep Reinforcement Learning // Proc of the International Conference on Autonomous Agents and Multiagent Systems. Berlin, Germany: Springer, 2017: 66-83.
[16] CHU X X, YE H J.Parameter Sharing Deep Deterministic Policy Gradient for Cooperative Multi-agent Reinforcement Learning[C/OL]. [2024-12-16].https://arxiv.org/pdf/1710.00336.
[17] SAMVELYAN M, RASHID T, DE WITT C S, et al. The StarCraft Multi-agent Challenge[C/OL].[2024-12-16]. https://arxiv.org/pdf/1902.04043.
[18] CHRISTIANOS F, PAPOUDAKIS G, RAHMAN M A, et al. Sca-ling Multi-agent Reinforcement Learning with Selective Parameter Sharing. Proceedings of Machine Learning Research, 2021, 139: 1989-1998.
[19] KIM W, SUNG Y.Parameter Sharing with Network Pruning for Sca-lable Multi-agent Deep Reinforcement Learning // Proc of the International Conference on Autonomous Agents and Multiagent Systems. New York, USA: ACM, 2023: 1942-1950.
[20] SU J T, CHEN Y H, CAI T L, et al. Sanity-Checking Pruning Methods: Random Tickets Can Win the Jackpot // Proc of the 34th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2020: 20390-20401.
[21] LI X R, PAN L, ZHANG J.Kaleidoscope: Learnable Masks for Heterogeneous Multi-agent Reinforcement Learning[C/OL]. [2024-12-16].https://arxiv.org/pdf/2410.08540.
[22] HE J M, LI K, ZANG Y F, et al. Not All Tasks Are Equally Difficult: Multi-task Deep Reinforcement Learning with Dynamic Depth Routing. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(11): 12376-12384.
[23] RASHID T, SAMVELYAN M, DE WITT C S, et al. Monotonic Value Function Factorisation for Deep Multi-agent Reinforcement Learning. Journal of Machine Learning Research, 2020, 21(1): 7234-7284.
[24] MORDATCH I, ABBEEL P.Emergence of Grounded Compositional Language in Multi-agent Populations. Proceedings of the AAAI Conference on Artificial Intelligence, 2018, 32(1): 1495-1502.
[25] SHAPLEY L S. Stochastic Games.Proceedings of the National Aca-demy of Sciences, 1953, 39(10): 1095-1100.
[26] LITTMAN M L.Markov Games as a Framework for Multi-agent Reinforcement Learning // Proc of the 11th International Conference on Machine Learning. New York, USA: ACM, 1994: 157-163.
[27] GRAESSER L, EVCI U, ELSEN E, et al. The State of Sparse Trai-ning in Deep Reinforcement Learning. Proceedings of Machine Learning Research, 2022, 162: 7766-7792.
[28] SOKAR G, MOCANU E, MOCANU D C, et al. Dynamic Sparse Training for Deep Reinforcement Learning(Poster)[C/OL].[2024-12-16]. https://arxiv.org/pdf/2106.04217v2.
[29] KUSUPATI A, RAMANUJAN V, SOMANI R, et al. Soft Thre-shold Weight Reparameterization for Learnable Sparsity. Procee-dings of Machine Learning Research, 2020, 119: 5544-5555.
[30] WANG T H, DONG H, LESSER V, et al. ROMA: Multi-agent Reinforcement Learning with Emergent Roles. Proceedings of Machine Learning Research, 2020, 119: 9876-9886.
[31] WANG T H, GUPTA T, MAHAGAN A, et al. RODE: Learning Roles to Decompose Multi-agent Tasks[C/OL].[2024-12-16]. https://arxiv.org/pdf/2010.01523.
[32] LI C H, WANG T H, WU C J, et al. Celebrating Diversity in Shared Multi-agent Reinforcement Learning // Proc of the 35th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2021: 3991-4002.
[33] LI D P, LOU N, ZHANG B, et al. Adaptive Parameter Sharing for Multi-agent Reinforcement Learning // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Washington, USA: IEEE, 2024: 6035-6039.