FANG Baofu1,2, WANG Qiong1, WANG Hao1, WANG Zaijun3
1. School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601; 2. College of Computer and Information Engineering, Xinjiang Agricultural University, Urumqi 830052; 3. Key Laboratory of Flight Techniques and Flight Safety, Civil Aviation Flight University of China, Guanghan 618307
Abstract In large-scale heterogeneous multi-agent reinforcement learning, parameter sharing is often utilized to reduce the number of training parameters and accelerate the training process. However, the traditional full parameter sharing approach is prone to causing excessive behavioral uniformity among agents, while independent parameter training methods are constrained by computational complexity and memory limitations. Therefore, a role-based adaptive parameter sharing(RAPS) method is proposed in this paper. First, agents are grouped into roles based on their task characteristics. Then, within a unified network structure, sparse sub-network structures are generated for different agent roles by integrating unstructured network pruning techniques. A dynamic adjustment mechanism is introduced to adaptively optimize the ratio of shared and independent parameters according to task requirements. Additionally, a collaborative loss function between roles is incorporated to further enhance coordination among heterogeneous agents. Thus, computational complexity is effectively reduced by RAPS while behavioral diversity among heterogeneous agents is preserved. Experimental results demonstrate that RAPS improves the performance and scalability of multi-agent systems significantly in different multi-agent tasks.
Fund:Natural Science Foundation of Anhui Province(No.2308085MF203), University Synergy Innovation Program of Anhui Province(No.GXXT-2022-055), Open Fund of Key Laboratory of Flight Techniques and Flight Safety, CAAC(No.FZ2022KF09), R&D Program of Key Laboratory of Flight Techniques and Flight Safety, CAAC(No.FZ2022ZZ02)
Corresponding Authors:FANG Baofu, Ph.D., associate professor. His research interests include intelligent robot systems.
About author:: WANG Qiong, Master student. Her research interests include multi-agent deep reinforcement learning. WANG Hao, Ph.D., professor. His research interests include distributed intelligent systems and robots. WANG Zaijun, Master, researcher. Her research interests include multi-robot task allo-cation and artificial intelligence.
[1] 陈卓然,刘泽阳,万里鹏,等.多智能体强化学习理论及其应用综述.模式识别与人工智能, 2024, 37(10): 851-872. (CHEN Z R, LIU Z Y, WAN L P, et al. A Review of Multi-agent Reinforcement Learning Theory and Applications. Pattern Recognition and Artificial Intelligence, 2024, 37(10): 851-872.) [2] YANG W L, WEI Y C, WEI H Y, et al. Survey on Explainable AI: From Approaches, Limitations and Applications Aspects. Human-Centric Intelligent Systems, 2023, 3(3): 161-188. [3] CANESE L, CARDARILLI G C, DI NUNZIO L, et al. Multi-agent Reinforcement Learning: A Review of Challenges and Applications. Applied Sciences, 2021, 11(11). DOI: 10.3390/app11114948. [4] WONG A, BÄCK T, KONONOVA A V, et al. Deep Multiagent Rein-forcement Learning: Challenges and Directions. Artificial Intelligence Review, 2023, 56(6): 5023-5056. [5] BOUKTIF S, CHENIKI A, OUNI A, et al. Deep Reinforcement Lear-ning for Traffic Signal Control with Consistent State and Reward Design Approach. Knowledge-Based Systems, 2023, 267. DOI: 10.1016/j.knosys.2023.110440. [6] SHI Y B, HU B, HUANG R.Task Allocation and Path Planning of Many Robots with Motion Uncertainty in a Warehouse Environment // Proc of the IEEE International Conference on Real-Time Computing and Robotics. Washington, USA: IEEE, 2021: 776-781. [7] 郭子恒,蔡晨晓.基于改进深度强化学习的无人机自主导航方法.信息与控制, 2023, 52(6): 736-746, 772. (GUO Z H, CAI C X.Autonomous Navigation Algorithm of UAV Based on Improved Deep-Reinforcement-Learning. Information and Control, 2023, 52(6): 736-746, 772.) [8] ZENG Y H, TAN X C, SHA M Q, et al. The Study of DDPG Based Spatiotemporal Dynamic Deployment Optimization of Air-Ground AD HOC Network for Disaster Emergency Response. International Journal of Applied Earth Observation and Geoinformation, 2024, 128. DOI: 10.1016/j.jag.2024.103708. [9] JAYANETTI A, HALGAMUGE S, BUYYA R.Multi-agent Deep Reinforcement Learning Framework for Renewable Energy-Aware Workflow Scheduling on Distributed Cloud Data Centers. IEEE Transactions on Parallel and Distributed Systems, 2024, 35(4): 604-615. [10] 徐佳,胡春鹤.分布式多经验池的无人机自主避碰方法.信息与控制, 2023, 52(4): 432-443. (XU J, HU C H.Autonomous Collision Avoidance Method of UAV Based on Distributed Multi-experience Pool. Information and Control, 2023, 52(4): 432-443.) [11] 方宝富,余婷婷,王浩,等.稀疏奖励场景下基于状态空间探索的多智能体强化学习算法.模式识别与人工智能, 2024, 37(5): 435-446. (FANG B F, YU T T, WANG H, et al. Multi-agent Reinforcement Learning Algorithm Based on State Space Exploration in Sparse Reward Scenarios. Pattern Recognition and Artificial Intelligence, 2024, 37(5): 435-446.) [12] YU C, VELU A, VINITSKY E, et al. The Surprising Effectiveness of PPO in Cooperative Multi-agent Games // Proc of the 36th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2022: 24611-24624. [13] KUBA J G, CHEN R Q, WEN M N, et al. Trust Region Policy Optimisation in Multi-agent Reinforcement Learning[C/OL].[2024-12-16]. https://arxiv.org/pdf/2109.11251. [14] 方宝富,余婷婷,王浩,等.稀疏奖励场景下基于适应性状态近似的多智能体强化学习.机器人, 2024, 46(6): 663-671, 682. (FANG B F, YU T T, WANG H, et al. Multi-agent Reinforcement Learning Based on Adaptive State Approximation in Sparse Reward Scenarios. Robot, 2024, 46(6): 663-671, 682.) [15] GUPTA J K, EGOROV M, KOCHENDERFER M.Cooperative Multi-agent Control Using Deep Reinforcement Learning // Proc of the International Conference on Autonomous Agents and Multiagent Systems. Berlin, Germany: Springer, 2017: 66-83. [16] CHU X X, YE H J.Parameter Sharing Deep Deterministic Policy Gradient for Cooperative Multi-agent Reinforcement Learning[C/OL]. [2024-12-16].https://arxiv.org/pdf/1710.00336. [17] SAMVELYAN M, RASHID T, DE WITT C S, et al. The StarCraft Multi-agent Challenge[C/OL].[2024-12-16]. https://arxiv.org/pdf/1902.04043. [18] CHRISTIANOS F, PAPOUDAKIS G, RAHMAN M A, et al. Sca-ling Multi-agent Reinforcement Learning with Selective Parameter Sharing. Proceedings of Machine Learning Research, 2021, 139: 1989-1998. [19] KIM W, SUNG Y.Parameter Sharing with Network Pruning for Sca-lable Multi-agent Deep Reinforcement Learning // Proc of the International Conference on Autonomous Agents and Multiagent Systems. New York, USA: ACM, 2023: 1942-1950. [20] SU J T, CHEN Y H, CAI T L, et al. Sanity-Checking Pruning Methods: Random Tickets Can Win the Jackpot // Proc of the 34th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2020: 20390-20401. [21] LI X R, PAN L, ZHANG J.Kaleidoscope: Learnable Masks for Heterogeneous Multi-agent Reinforcement Learning[C/OL]. [2024-12-16].https://arxiv.org/pdf/2410.08540. [22] HE J M, LI K, ZANG Y F, et al. Not All Tasks Are Equally Difficult: Multi-task Deep Reinforcement Learning with Dynamic Depth Routing. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(11): 12376-12384. [23] RASHID T, SAMVELYAN M, DE WITT C S, et al. Monotonic Value Function Factorisation for Deep Multi-agent Reinforcement Learning. Journal of Machine Learning Research, 2020, 21(1): 7234-7284. [24] MORDATCH I, ABBEEL P.Emergence of Grounded Compositional Language in Multi-agent Populations. Proceedings of the AAAI Conference on Artificial Intelligence, 2018, 32(1): 1495-1502. [25] SHAPLEY L S. Stochastic Games.Proceedings of the National Aca-demy of Sciences, 1953, 39(10): 1095-1100. [26] LITTMAN M L.Markov Games as a Framework for Multi-agent Reinforcement Learning // Proc of the 11th International Conference on Machine Learning. New York, USA: ACM, 1994: 157-163. [27] GRAESSER L, EVCI U, ELSEN E, et al. The State of Sparse Trai-ning in Deep Reinforcement Learning. Proceedings of Machine Learning Research, 2022, 162: 7766-7792. [28] SOKAR G, MOCANU E, MOCANU D C, et al. Dynamic Sparse Training for Deep Reinforcement Learning(Poster)[C/OL].[2024-12-16]. https://arxiv.org/pdf/2106.04217v2. [29] KUSUPATI A, RAMANUJAN V, SOMANI R, et al. Soft Thre-shold Weight Reparameterization for Learnable Sparsity. Procee-dings of Machine Learning Research, 2020, 119: 5544-5555. [30] WANG T H, DONG H, LESSER V, et al. ROMA: Multi-agent Reinforcement Learning with Emergent Roles. Proceedings of Machine Learning Research, 2020, 119: 9876-9886. [31] WANG T H, GUPTA T, MAHAGAN A, et al. RODE: Learning Roles to Decompose Multi-agent Tasks[C/OL].[2024-12-16]. https://arxiv.org/pdf/2010.01523. [32] LI C H, WANG T H, WU C J, et al. Celebrating Diversity in Shared Multi-agent Reinforcement Learning // Proc of the 35th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2021: 3991-4002. [33] LI D P, LOU N, ZHANG B, et al. Adaptive Parameter Sharing for Multi-agent Reinforcement Learning // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Washington, USA: IEEE, 2024: 6035-6039.