A Review of Multi-agent Reinforcement Learning Theory and Applications
CHEN Zhuoran1, LIU Zeyang1, WAN Lipeng1, CHEN Xingyu1, ZHU Yameng2, WANG Chengze2, CHENG Xiang3, ZHANG Ya4, ZHANG Senlin5, WANG Xiaohui6, LAN Xuguang1
1. Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an 710049; 2. China Academy of Launch Vehicle Technology, Beijing 100076; 3. School of Electronics, Peking University, Beijing 100871; 4. School of Automation, Southeast University, Nanjing 210096; 5. College of Electrical Engineering, Zhejiang University, Hangzhou 310027; 6. Artificial Intelligence Research Institute, China Electric Power Research Institute, Beijing 100192
Abstract:Reinforcement learning(RL) is a widely utilized machine learning paradigm for addressing sequential decision-making problems. Its core principle involves enabling agents to learn optimal policies iteratively through feedback derived from interactions between an agent and the environment. As the demands for computational power and data scale of practical applications continue to escalate, the transition from single-agent intelligence to collective intelligence becomes an inevitable trend in the future development of artificial intelligence. Therefore, challenges and opportunities are abundant for RL. In this paper, grounded on the concept of deep multi-agent reinforcement learning(MARL), the current theoretical dilemmas are refined and analyzed, including limited scalability, credit assignment, exploration-exploitation dilemma, non-stationarity and partial observability of information. Various solutions and their advantages and disadvantages proposed by researchers are elaborated. Typical training and learning environment of MARL and its practical applications in complex decision-making fields, such as smart city construction, gaming, robotics control and autonomous driving, are introduced. The challenges and future development direction of collaborative multi-agent reinforcement learning are summarized.
陈卓然, 刘泽阳, 万里鹏, 陈星宇, 朱雅萌, 王成泽, 程翔, 张亚, 张森林, 王晓辉, 兰旭光. 多智能体强化学习理论及其应用综述[J]. 模式识别与人工智能, 2024, 37(10): 851-872.
CHEN Zhuoran, LIU Zeyang, WAN Lipeng, CHEN Xingyu, ZHU Yameng, WANG Chengze, CHENG Xiang, ZHANG Ya, ZHANG Senlin, WANG Xiaohui, LAN Xuguang. A Review of Multi-agent Reinforcement Learning Theory and Applications. Pattern Recognition and Artificial Intelligence, 2024, 37(10): 851-872.
[1] KAELBLING L P, LITTMAN M L, MOORE A W.Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research, 1996, 4(1): 237-285. [2] BUŞONIU L, BABUŠKA R, DE SCHUTTER B. Multi-agent Reinforcement Learning: An Overview // SRINIVASAN D, JAIN L C, eds. Innovations in Multi-agent Systems and Applications-1. Berlin, Germany: Springer, 2010: 183-221. [3] ARULKUMARAN K, DEISENROTH M P, BRUNDAGE M, et al. Deep Reinforcement Learning: A Brief Survey. IEEE Signal Processing Magazine, 2017, 34(6): 26-38. [4] SILVER D, HUANG A, MADDISON C J, et al. Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature, 2016, 529: 484-489. [5] VINYALS O, BABUSCHKIN I, CZARNECKI W M, et al. Grandmaster Level in StarCraft II Using Multi-agent Reinforcement Lear-ning. Nature, 2019, 575: 350-354. [6] LOWE R, WU Y, TAMAR A, et al. Multi-agent Actor-Critic for Mixed Cooperative-Competitive Environments // Proc of the 31st International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2017: 6382-6393. [7] WU J, XU X.Decentralised Grid Scheduling Approach Based on Multi-agent Reinforcement Learning and Gossip Mechanism. CAAI Transactions on Intelligence Technology, 2018, 3(1): 8-17. [8] WEI H, CHEN C C, ZHENG G J, et al. PressLight: Learning Max Pressure Control to Coordinate Traffic Signals in Arterial Network // Proc of the 25th ACM SIGKDD International Conference on Know-ledge Discovery and Data Mining. New York, USA: ACM, 2019: 1290-1298. [9] DELAURENTIS D A, PANCHAL J H, RAZ A K, et al. Toward Automated Game Balance: A Systematic Engineering Design Approach // Proc of the IEEE Conference on Games. Washington, USA: IEEE, 2021. DOI: 10.1109/CoG52621.2021.9619032. [10] OLIEHOEK F A, AMATO C.A Concise Introduction to Decentra-lized POMDPs. Berlin, Germany: Springer, 2016. [11] LEIBO J Z, ZAMBALDI V, LANCTOT M, et al. Multi-agent Reinforcement Learning in Sequential Social Dilemmas // Proc of the 16th Conference on Autonomous Agents and Multi-agent Systems. New York, USA: ACM, 2017: 464-473. [12] HUGHES E, LEIBO J Z, PHILLIPS M, et al. Inequity Aversion Improves Cooperation in Intertemporal Social Dilemmas // Proc of the 32nd International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2018: 3330-3340. [13] RASHID T, SAMVELYAN M, DE WITT C S, et al. Monotonic Value Function Factorisation for Deep Multi-agent Reinforcement Learning. Journal of Machine Learning Research, 2020, 21: 7234-7284. [14] SUNEHAG P, LEVER G, GRUSLYS A, et al. Value-Decomposition Networks for Cooperative Multi-agent Learning // Proc of the 17th International Conference on Autonomous Agents and Multi-agent Systems. New York, USA: ACM, 2018: 2085-2087. [15] FOERSTER J, FARQUHAR G, AFOURAS T, et al. Counterfactual Multi-agent Policy Gradients. Proceedings of the AAAI Confe-rence on Artificial Intelligence, 2018, 32(1): 2974-2982. [16] LI J H, KUANG K, WANG B X, et al. Shapley Counterfactual Credits for Multi-agent Reinforcement Learning // Proc of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mi-ning. New York, USA: ACM, 2021: 934-942. [17] DAS A, GERVET T, ROMOFF J, et al. TarMAC: Targeted Multi-agent Communication. Journal of Machine Learning Research, 2019, 97: 1538-1546. [18] LIU Y, WANG W X, HU Y J, et al. Multi-agent Game Abstraction via Graph Attention Neural Network. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(5): 7211-7218. [19] SUKHBAATAR S, SZLAM A, FERGUS R. Learning Multiagent Communication with Backpropagation // Proc of the 30th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2016: 2252-2260. [20] AUER P.Using Confidence Bounds for Exploitation-Exploration Trade-Offs. Journal of Machine Learning Research, 2002, 3: 397-422. [21] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-Level Control through Deep Reinforcement Learning. Nature, 2015, 518: 529-533. [22] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous Control with Deep Reinforcement Learning[C/OL].[2024-09-21]. https://arxiv.org/pdf/1509.02971. [23] WANG T H, WANG J H, WU Y, et al. Influence-Based Multi-agent Exploration[C/OL].[2024-09-21]. https://arxiv.org/pdf/1910.05512. [24] ZHENG L L, CHEN J R, WANG J H, et al. Episodic Multi-agent Reinforcement Learning with Curiosity-Driven Exploration // Proc of the 30th International Conference on Neural Information Proce-ssing Systems. Cambridge, USA: MIT Press, 2021: 3757-3769. [25] HERNANDEZ-LEAL P, KARTAL B, TAYLOR M E.A Survey and Critique of Multiagent Deep Reinforcement Learning. Autonomous Agents and Multi-agent Systems, 2019, 33(6): 750-797. [26] WATKINS C J C H, DAYAN P. Q-Learning. Machine Learning, 1992, 8(3): 279-292. [27] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing Atari with Deep Reinforcement Learning[C/OL].[2024-09-21]. https://arxiv.org/pdf/1312.5602. [28] VAN HASSELT H, GUEZ A, SILVER D.Deep Reinforcement Learning with Double Q-Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 2016, 30(1): 2094-2100. [29] WANG Z, SCHAUL T, HESSEL M, et al. Dueling Network Architectures for Deep Reinforcement Learning. Journal of Machine Learning Research, 2016, 48: 1995-2003. [30] HAUSKNECHT M, STONE P. Deep Recurrent Q-Learning for Par-tially Observable MDPs // Proc of the AAAI Fall Symposium(Sequential Decision Making for Intelligent Agents). Palo Alto, USA: AAAI Press, 2015: 29-37. [31] SCHULMAN J, LEVINE S, MORITZ P, et al. Trust Region Policy Optimization // Proc of the 32nd International Conference on Machine Learning. San Diego, USA: JMLR, 2015: 1889-1897. [32] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal Policy Optimization Algorithms[C/OL].[2024-09-21]. https://arxiv.org/pdf/1707.06347. [33] WILLIAMS R J.Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. Machine Learning, 1992, 8: 229-256. [34] GU S X, LILLICRAP T, GHAHRAMANI Z, et al. Q-Prop: Sample-Efficient Policy Gradient with an Off-Policy Critic[C/OL].[2024-09-21]. https://arxiv.org/pdf/1611.02247. [35] MNIH V, BADIA A P, MIRZA M, et al. Asynchronous Methods for Deep Reinforcement Learning. Journal of Machine Learning Research, 2016, 48: 1928-1937. [36] BUSONIU L, BABUSKA R, DE SCHUTTER B.A Comprehensive Survey of Multiagent Reinforcement Learning. IEEE Transactions on Systems, Man, and Cybernetics(Applications and Reviews), 2008, 38(2): 156-172. [37] LEONARDOS S, PILIOURAS G, SPENDLOVE K. Exploration-Exploitation in Multi-agent Competition: Convergence with Boun-ded Rationality // Proc of the 35th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2021: 26318-26331. [38] PAPOUDAKIS G, CHRISTIANOS F, RAHMAN A, et al. Dealing with Non-stationarity in Multi-agent Deep Reinforcement Learning[C/OL].[2024-09-21]. https://arxiv.org/pdf/1906.04737. [39] NG A Y, RUSSELL S J.Algorithms for Inverse Reinforcement Lear-ning // Proc of the 17th International Conference on Machine Learning. San Diego, USA: JMLR, 2000: 663-670. [40] TAMPUU A, MATIISEN T, KODELJA D, et al. Multiagent Co-operation and Competition with Deep Reinforcement Learning. PloS One, 2017, 12(4). DOI: 10.1371/journal.pone.0172395. [41] LERER A, PEYSAKHOVICH A.Maintaining Cooperation in Complex Social Dilemmas Using Deep Reinforcement Learning[C/OL]. [2024-09-21].https://arxiv.org/abs/1707.01068v4. [42] BANSAL T, PACHOCKI J, SIDOR S, et al. Emergent Complexity via Multi-agent Competition[C/OL].[2024-09-21]. https://arxiv.org/pdf/1710.03748. [43] GUPTA J K, EGOROV M, KOCHENDERFER M.Cooperative Multi-agent Control Using Deep Reinforcement Learning // Proc of the International Conference on Autonomous Agents and Multiagent Systems. Berlin, Germany: Springer, 2017: 66-83. [44] DE WITT C S, GUPTA T, MAKOVIICHUK D, et al. Is Independent Learning All You Need in the Starcraft Multi-agent Cha-llenge?[C/OL]. [2024-09-21]. https://arxiv.org/pdf/2011.09533. [45] YU C, VELU A, VINITSKY E, et al. The Surprising Effectiveness of PPO in Cooperative, Multi-agent Games[C/OL].[2024-09-21]. https://arxiv.org/pdf/2103.01955. [46] LI C H, WANG T H, WU C J, et al. Celebrating Diversity in Shared Multi-agent Reinforcement Learning // Proc of the 35th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2021: 3991-4002. [47] JIANG J C, LU Z Q.The Emergence of Individuality. Journal of Machine Learning Research, 2021, 139: 4992-5001. [48] LIU S Y, ZHOU Y H, SONG J, et al. Contrastive Identity-Aware Learning for Multi-agent Value Decomposition. Proceedings of the AAAI Conference on Artificial Intelligence, 2023, 37(10): 11595-11603. [49] WANG T H, GUPTA T, MAHAJAN A, et al. RODE: Learning Roles to Decompose Multi-agent Tasks[C/OL].[2024-09-21]. https://arxiv.org/pdf/2010.01523. [50] LIU B, LIU Q, STONE P, et al. Coach-Player Multi-agent Reinforcement Learning for Dynamic Team Composition. Journal of Machine Learning Research, 2021, 139: 6860-6870. [51] HU Z C, ZHANG Z Z, LI H X, et al. Attention-Guided Contrastive Role Representations for Multi-agent Reinforcement Learning[C/OL].[2024-09-21]. https://arxiv.org/pdf/2312.04819. [52] ZHOU Y H, LIU S Y, QING Y P, et al. Is Centralized Training with Decentralized Execution Framework Centralized Enough for MARL?[C/OL]. [2024-09-21]. https://arxiv.org/pdf/2305.17352. [53] WOLPERT D H, TUMER K.A Survey of Collective Intelligence[C/OL]. [2024-09-21]. https://ntrs.nasa.gov/api/citations/20000086233/downloads/20000086233.pdf. [54] YANG Y D, HAO J Y, LIAO B, et al. Qatten: A General Framework for Cooperative Multiagent Reinforcement Learning[C/OL].[2024-09-21]. https://arxiv.org/pdf/2002.03939. [55] YAO X H, WEN C, WANG Y H,et al. SMIX(λ): Enhancing Centralized Value Functions for Cooperative Multiagent Reinforcement Learning. IEEE Transactions on Neural Networks and Lear-ning Systems, 2023, 34(1): 52-63. [56] YANG Y D, HAO J Y, CHEN G Y, et al. Q-Value Path Decomposition for Deep Multiagent Reinforcement Learning. Journal of Machine Learning Research, 2020, 119: 10706-10715. [57] SON K, KIM D, KANG W J, et al. QTRAN: Learning to Facto-rize with Transformation for Cooperative Multi-agent Reinforcement Learning. Journal of Machine Learning Research, 2019, 97: 5887-5896. [58] WANG J H, REN Z Z, LIU T, et al. QPLEX: Duplex Dueling Multi-agent Q-Learning[C/OL].[2024-09-21]. https://arxiv.org/pdf/2008.01062. [59] RASHID T, FARQUHAR G, PENG B, et al. Weighted QMIX: Expanding Monotonic Value Function Factorisation for Deep Multi-agent Reinforcement Learning // Proc of the 34th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2020: 10199-10210. [60] WAN L P, LIU Z Y, CHEN X Y, et al. Greedy-Based Value Re-presentation for Optimal Coordination in Multi-agent Reinforcement Learning. Journal of Machine Learning Research, 2022, 162: 22512-22535. [61] AUER P, CESA-BIANCHI N, FISCHER P.Finite-Time Analysis of the Multiarmed Bandit Problem. Machine Learning, 2002, 47(2): 235-256. [62] MAHAJAN A, RASHID T, SAMVELYAN M, et al. MAVEN: Multi-agent Variational Exploration // Proc of the 33rd Internatio-nal Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2019: 7613-7624. [63] LIU I J, JAIN U, YEH R A, et al. Cooperative Exploration for Multi-agent Deep Reinforcement Learning. Journal of Machine Learning Research, 2021, 139: 6826-6836. [64] GUPTA T, MAHAJAN A, PENG B, et al. UneVEn: Universal Value Exploration for Multi-agent Reinforcement Learning. Journal of Machine Learning Research, 2021, 139: 3930-3941. [65] TANG Z G, YU C, CHEN B Y, et al. Discovering Diverse Multi-agent Strategic Behavior via Reward Randomization[C/OL].[2024-09-21]. https://arxiv.org/pdf/2103.04564. [66] ECOFFET A, HUIZINGA J, LEHMAN J, et al. Go-Explore: A New Approach for Hard-Exploration Problems[C/OL].[2024-09-21]. https://arxiv.org/pdf/1901.10995. [67] ECOFFET A, HUIZINGA J, LEHMAN J, et al. First Return, Then Explore. Nature, 2021, 590: 580-586. [68] LIU Z Y, WAN L P, YANG X R, et al. Imagine, Initialize, and Explore: An Effective Exploration Method in Multi-agent Reinforcement Learning[C/OL].[2024-09-21]. https://arxiv.org/pdf/2402.17978. [69] FOERSTER J N, ASSAEL Y M, DE FREITAS N, et al. Learning to Communicate with Deep Multi-agent Reinforcement Learning // Proc of the 30th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2016: 2145-2153. [70] JIANG J C, LU Z Q.Learning Attentional Communication for Multi-agent Cooperation // Proc of the 32nd International Confe-rence on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2018: 7265-7275. [71] SINGH A, JAIN T, SUKHBAATAR S.Learning When to Communicate at Scale in Multiagent Cooperative and Competitive Tasks[C/OL]. [2024-09-21]. https://arxiv.org/pdf/1812.09755. [72] ZHANG S Q, ZHANG Q, LIN J Y.Efficient Communication in Multi-agent Reinforcement Learning via Variance Based Control // Proc of the 33rd International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2019: 3235-3244. [73] WANG T H, WANG J H, ZHENG C Y, et al. Learning Nearly Decomposable Value Functions via Communication Minimization[C/OL].[2024-09-21]. https://arxiv.org/pdf/1910.05366. [74] BATTAGLIA P W, HAMRICK J B, BAPST V, et al. Relational Inductive Biases, Deep Learning, and Graph Networks[C/OL].[2024-09-21]. https://arxiv.org/pdf/1806.01261. [75] BÖHMER W, KURIN V, WHITESON S. Deep Coordination Graphs // Proc of the 37th International Conference on Machine Learning. San Diego, USA: JMLR,2020: 980-991. [76] LI S, GUPTA J K, MORALES P, et al. Deep Implicit Coordination Graphs for Multi-agent Reinforcement Learning[C/OL].[2024-09-21]. https://arxiv.org/pdf/2006.11438. [77] YANG Q L, DONG W J, REN Z Z, et al. Self-Organized Polynomial-Time Coordination Graphs. Journal of Machine Learning Research, 2022, 162: 24963-24979. [78] SHI Y C, DUAN S H, XU C, et al. Dynamic Deep Factor Graph for Multi-agent Reinforcement Learning[C/OL].[2024-09-21]. https://arxiv.org/pdf/2405.05542. [79] NIU Y R, PALEJA R R, GOMBOLAY M. Multi-agent Graph-Attention Communication and Teaming // Proc of the 20th International Conference on Autonomous Agents and Multi-agent Systems. New York, USA: ACM, 964-973. [80] DU Y L, LIU B, MOENS V, et al. Learning Correlated Communication Topology in Multi-agent Reinforcement Learning // Proc of the 20th International Conference on Autonomous Agents and Multi-agent Systems. New York, USA: ACM, 2021: 456-464. [81] LIU Z Y, WAN L P, SUI X, et al. Deep Hierarchical Communication Graph in Multi-agent Reinforcement Learning // Proc of the 32nd International Joint Conference on Artificial Intelligence. San Francisco, USA: IJCAI, 2023: 208-216. [82] CHU T S, CHINCHALI S, KATTI S.Multi-agent Reinforcement Learning for Networked System Control[C/OL]. [2024-09-21]. https://arxiv.org/pdf/2004.01339. [83] QU C, LI H, LIU C, et al. Intention Propagation for Multi-agent Reinforcement Learning[C/OL].[2024-09-21]. https://arxiv.org/pdf/2004.08883v2. [84] KIM W, PARK J, SUNG Y C.Communication in Multi-agent Reinforcement Learning: Intention Sharing[C/OL]. [2024-09-21].https://openreview.net/pdf?id=qpsl2dR9twy. [85] ZHU C X, DASTANI M, WANG S H.A Survey of Multi-agent Deep Reinforcement Learning with Communication[C/OL]. [2024-09-21].https://arxiv.org/pdf/2203.08975. [86] KUBA J G, CHEN R Q, WEN M N, et al. Trust Region Policy Optimisation in Multi-agent Reinforcement Learning[C/OL].[2024-09-21]. https://arxiv.org/pdf/2109.11251. [87] WANG H W, YU L T, CAO Z J, et al. Multi-agent Imitation Learning with Copulas // Proc of the European Conference on Machine Learning and Knowledge Discovery in Databases. Berlin, Germany: Springer, 2021: 139-156. [88] ZHU T C, QIU Y, ZHOU H Y, et al. Decoding Global Prefe-rences: Temporal and Cooperative Dependency Modeling in Multi-agent Preference-Based Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(15): 17202-17210. [89] KANG S, LEE Y, YUN S Y.DPM: Dual Preferences-Based Multi-agent Reinforcement Learning[C/OL]. [2024-09-21].https://openreview.net/pdf?id=TW3DIP2h5p. [90] LIU Z Y, YANG X R, SUN S G, et al. Grounded Answers for Multi-agent Decision-Making Problem through Generative World Model[C/OL].[2024-09-21]. https://arxiv.org/pdf/2410.02664. [91] MA H, HU T Y, PU Z Q, et al. Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-agent Reinforcement Learning[C/OL].[2024-09-21]. https://arxiv.org/pdf/2410.06101. [92] SAMVELYAN M, RASHID T, DE WITT C S, et al. The StarCraft Multi-agent Challenge[C/OL].[2024-09-21]. https://arxiv.org/pdf/1902.04043 [93] KURACH K, RAICHUK A, STAŃCZYK P, et al. Google Research Football: A Novel Reinforcement Learning Environment. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(4): 4501-4510. [94] BAKER B, KANITSCHEIDER I, MARKOV T, et al. Emergent Tool Use from Multi-agent Autocurricula[C/OL].[2024-09-21]. https://arxiv.org/pdf/1909.07528. [95] BERNER C, BROCKMAN G, CHAN B, et al. Dota 2 with Large Scale Deep Reinforcement Learning[C/OL].[2024-09-21]. https://arxiv.org/pdf/1912.06680. [96] IQBAL S, SHA F.Actor-Attention-Critic for Multi-agent Reinfor-cement Learning. Journal of Machine Learning Research, 2019, 97: 2961-2970. [97] YANG Y D, LUO R, LI M N, et al. Mean Field Multi-agent Reinforcement Learning. Journal of Machine Learning Research, 2018, 80: 5571-5580. [98] ZHANG K Q, YANG Z R, LIU H, et al. Fully Decentralized Multi-agent Reinforcement Learning with Networked Agents. Journal of Machine Learning Research, 2018, 80: 5872-5881. [99] MAO H Y, LIU W L, HAO J Y, et al. Neighborhood Cognition Consistent Multi-agent Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(5): 7219-7226. [100] MORRA L, MANIGRASSO F, CANTO G, et al. Slicing and Dicing Soccer: Automatic Detection of Complex Events from Spatio-Temporal Data // Proc of the International Conference on Image Analysis and Recognition. Berlin, Germany: Springer, 2020: 107-121. [101] ROY J, BARDE P, HARVEY F G, et al. Promoting Coordination through Policy Regularization in Multi-agent Deep Reinforcement Learning // Proc of the 34th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2020: 15774-15785. [102] DE WITT C A S, PENG B, KAMIENNY P A, et al. Deep Multi-agent Reinforcement Learning for Decentralized Continuous Co-operative Control[C/OL].[2024-09-21]. https://arxiv.org/pdf/2003.06709v4. [103] ELLIS B, COOK J, MOALLA S, et al. SMACv2: An Improved Benchmark for Cooperative Multi-agent Reinforcement Learning // Proc of the 37th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2024: 37567-37593. [104] CHUNG H M, MAHARJAN S, ZHANG Y, et al. Distributed Deep Reinforcement Learning for Intelligent Load Scheduling in Residential Smart Grids. IEEE Transactions on Industrial Informatics, 2021, 17(4): 2752-2763. [105] CAO D, ZHAO J B, HU W H, et al. Data-Driven Multi-agent Deep Reinforcement Learning for Distribution System Decentra-lized Voltage Control with High Penetration of PVs. IEEE Tran-sactions on Smart Grid, 2021, 12(5): 4137-4150. [106] ZHAO Y Z, LIU T, HILL D J.A Multi-agent Reinforcement Learning Based Frequency Control Method with Data-Enabled Predictive Control Guided Policy Search // Proc of the IEEE Power and Energy Society General Meeting. Washington, USA: IEEE, 2022. DOI: 10.1109/PESGM48719.2022.9917031. [107] LIU Y, QU Z H, XIN H H, et al. Distributed Real-Time Optimal Power Flow Control in Smart Grid. IEEE Transactions on Power Systems, 2017, 32(5): 3403-3414. [108] GAO Y, AI Q.Distributed Multi-agent Control for Combined AC/DC Grids with Wind Power Plant Clusters. IET Generation, Transmission and Distribution, 2018, 12(3): 670-677. [109] RADHAKRISHNAN B M, SRINIVASAN D.A Multi-agent Based Distributed Energy Management Scheme for Smart Grid Applications. Energy, 2016, 103: 192-204. [110] CHNITER H, LI Y T, KHALGUI M, et al. Multi-agent Adaptive Architecture for Flexible Distributed Real-Time Systems. IEEE Access, 2018, 6: 23152-23171. [111] WANG J H, XU W K, GU Y J, et al. Multi-agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks // Proc of the 35th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2021: 3271-3284. [112] ZHANG M, PAN C H.Hierarchical Optimization Scheduling Algorithm for Logistics Transport Vehicles Based on Multi-agent Reinforcement Learning. IEEE Transactions on Intelligent Transportation Systems, 2024, 25(3): 3108-3117. [113] ZHANG L X, YANG C, YAN Y, et al. Distributed Real-Time Scheduling in Cloud Manufacturing by Deep Reinforcement Lear-ning. IEEE Transactions on Industrial Informatics, 2022, 18(12): 8999-9007. [114] KRNJAIC A, STELEAC R D, THOMAS J D, et al. Scalable Multi-agent Reinforcement Learning for Warehouse Logistics with Robotic and Human Co-workers[C/OL].[2024-09-21]. https://arxiv.org/pdf/2212.11498. [115] JO H, LEE H, JEON S, et al. Multi-agent Reinforcement Lear-ning-Based UAS Control for Logistics Environments // Proc of the Asia-Pacific International Symposium on Aerospace Technology. Berlin, Germany: Springer, 2021: 963-972. [116] KHAYYAT M, AWASTHI A.An Intelligent Multi-agent Based Model for Collaborative Logistics Systems. Transportation Research Procedia, 2016, 12: 325-338. [117] LI X H, ZHANG J, BIAN J, et al. A Cooperative Multi-agent Reinforcement Learning Framework for Resource Balancing in Com-plex Logistics Network // Proc of the 18th International Confe-rence on Autonomous Agents and Multi-agent Systems. New York, USA: ACM, 2019: 980-988. [118] PRABUCHANDRAN K J, HEMANTH K N, SHALABH B.Multi-agent Reinforcement Learning for Traffic Signal Control // Proc of the 17th IEEE International Conference on Intelligent Transportation Systems. Washington, USA: IEEE, 2014: 2529-2534. [119] GHANADBASHI S, GOLPAYEGANI F.Using Ontology to Guide Reinforcement Learning Agents in Unseen Situations: A Traffic Signal Control System Case Study. Applied Intelligence, 2022, 52(2): 1808-1824. [120] NOAEEN M, NAIK A, GOODMAN L, et al. Reinforcement Lear-ning in Urban Network Traffic Signal Control: A Systematic Lite-rature Review. Expert Systems with Applications, 2022, 199.DOI: 10.1016/j.eswa.2022.116830. [121] GE J I, OROSZ G.Dynamics of Connected Vehicle Systems with Delayed Acceleration Feedback. Transportation Research Part C(Emerging Technologies), 2014, 46: 46-64. [122] WU C, KREIDIEH A, VINITSKY E, et al. Emergent Behaviors in Mixed-Autonomy Traffic // Proc of the 1st Annual Conference on Robot Learning. San Diego, USA: JMLR, 2017: 398-407. [123] CHE A D, WANG Z L, ZHOU C H.Multi-agent Deep Reinforcement Learning for Recharging-Considered Vehicle Scheduling Problem in Container Terminals. IEEE Transactions on Intelligent Transportation Systems, 2024, 25(11): 16855-16868. [124] MAO F, LI Z H, LIN Y L, et al. Mastering Arterial Traffic Signal Control with Multi-agent Attention-Based Soft Actor-Critic Model. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(3): 3129-3144. [125] CHEN D, HAJIDAVALLOO M R, LI Z J, et al. Deep Multi-agent Reinforcement Learning for Highway On-Ramp Merging in Mixed Traffic. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(11): 11623-11638. [126] WANG K, SHEN Z S, LEI Z, et al. Towards Multi-agent Reinforcement Learning Based Traffic Signal Control through Spatio-Temporal Hypergraphs[C/OL].[2024-09-21]. https://arxiv.org/pdf/2404.11014. [127] VIDHATE D A, KULKARNI P.Cooperative Multi-agent Reinforcement Learning Models(CMRLM) for Intelligent Traffic Control // Proc of the 1st International Conference on Intelligent Systems and Information Management. Washington, USA: IEEE, 2017: 325-331. [128] LOUATI A, LOUATI H, KARIRI E, et al. Sustainable Smart Cities through Multi-agent Reinforcement Learning-Based Coope-rative Autonomous Vehicles. Sustainability, 2024, 16(5).DOI: 10.3390/su16051779. [129] WU C L, MA Z L, KIM I.Multi-agent Reinforcement Learning for Traffic Signal Control: Algorithms and Robustness Analysis // Proc of the IEEE 23rd International Conference on Intelligent Transportation Systems. Washington, USA: IEEE, 2020. DOI: 10.1109/ITSC45102.2020.9294623. [130] ZEYNIVAND A, JAVADPOUR A, BOLOUKI S, et al. Traffic Flow Control Using Multi-agent Reinforcement Learning. Journal of Network and Computer Applications, 2022, 207. DOI: 10.1016/j.jnca.2022.103497. [131] YANG S T.Hierarchical Graph Multi-agent Reinforcement Lear-ning for Traffic Signal Control. Information Sciences, 2023, 634: 55-72. [132] WU T, ZHOU P, LIU K, et al. Multi-agent Deep Reinforcement Learning for Urban Traffic Light Control in Vehicular Networks. IEEE Transactions on Vehicular Technology, 2020, 69(8): 8243-8256. [133] MUSHTAQ A, HAQ I U, SARWAR M A, et al. Multi-agent Reinforcement Learning for Traffic Flow Management of Autonomous Vehicles. Sensors, 2023, 23(5). DOI: 10.3390/s23052373. [134] WANG C, ZHANG Q F, TIAN Q Y, et al. Learning Mobile Manipulation through Deep Reinforcement Learning. Sensors, 2020, 20(3). DOI: 10.3390/s20030939. [135] TREMBLAY J, TO T, SUNDARALINGAM B, et al. Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects // Proc of the 2nd Conference on Robot Learning. San Diego, USA: JMLR, 2018: 306-316. [136] PHAM H X, LA H M, FEIL-SEIFER D, et al. Cooperative and Distributed Reinforcement Learning of Drones for Field Coverage[C/OL].[2024-09-21]. https://arxiv.org/pdf/1803.07250v1. [137] SARTORETTI G, WU Y, PAIVINE W, et al. Distributed Reinforcement Learning for Multi-robot Decentralized Collective Construction // Proc of the 14th International Symposium on Distributed Autonomous Robotic Systems. Berlin, Germany: Springer, 2019: 35-49. [138] MA C D, LI A M, DU Y L, et al. Efficient and Scalable Reinforcement Learning for Large-Scale Network Control. Nature Machine Intelligence, 2024, 6: 1006-1020. [139] ZHAO W S, QUERALTA J P, WESTERLUND T.Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: A Survey // Proc of the IEEE Symposium Series on Computational Intelligence. Washington, USA: IEEE, 2020: 737-744.