多智能体强化学习理论及其应用综述

doi:10.16451/j.cnki.issn1003-6059.202410001

Abstract
Figure/Table
References
Related Citation (15)

Download: PDF (1775 KB) HTML (1 KB)
Export: BibTeX | EndNote (RIS)

Abstract Reinforcement learning(RL) is a widely utilized machine learning paradigm for addressing sequential decision-making problems. Its core principle involves enabling agents to learn optimal policies iteratively through feedback derived from interactions between an agent and the environment. As the demands for computational power and data scale of practical applications continue to escalate, the transition from single-agent intelligence to collective intelligence becomes an inevitable trend in the future development of artificial intelligence. Therefore, challenges and opportunities are abundant for RL. In this paper, grounded on the concept of deep multi-agent reinforcement learning(MARL), the current theoretical dilemmas are refined and analyzed, including limited scalability, credit assignment, exploration-exploitation dilemma, non-stationarity and partial observability of information. Various solutions and their advantages and disadvantages proposed by researchers are elaborated. Typical training and learning environment of MARL and its practical applications in complex decision-making fields, such as smart city construction, gaming, robotics control and autonomous driving, are introduced. The challenges and future development direction of collaborative multi-agent reinforcement learning are summarized.

Key words： Deep Reinforcement Learning Multi-agent Credit Assignment Human Feedback Markov Decision Process

Received: 30 September 2024

ZTFLH:

TP 181

Fund:National Key Research and Development Program of China(No.2021ZD0112700), National Natural Science Foun-dation of China(No.62125305,62088102,U23A20339,62203348)

Corresponding Authors: LAN Xuguang, Ph.D., professor. His research interests include computer vision and machine learning.

About author:: CHEN Zhuoran, Ph.D. candidate. His research interests include deep reinforcement learning. LIU Zeyang, Ph.D., assistant professor. His research interests include deep reinforcement learning. WAN Lipeng, Ph.D., assistant professor. His research interests include deep reinforcement learning and coexisting-cooperative-cognitive robots. CHEN Xingyu, Ph.D., assistant profe-ssor. His research interests include computer vision and machine learning. ZHU Yameng, Master, engineer. Her research interests include game theory and autonomous control of agents. WANG Chengze, Master student. His research interests include game theory and autonomous control of agents. CHENG Xiang, Ph.D., professor. His research interests include data-driven intelligence network and networked intelligence. ZHANG Ya, Ph.D., professor. Her research interests include multi-agent game theory and reinforcement learning. Zhang Senlin, Master, professor. His research interests include control theory and its applications. WANG Xiaohui, Ph.D., senior engineer. His research interests include electric power artificial intelligence, electric power systems and automation.

	Service

	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	CHEN Zhuoran
	LIU Zeyang
	WAN Lipeng
	CHEN Xingyu
	ZHU Yameng
	WANG Chengze
	CHENG Xiang
	ZHANG Ya
	ZHANG Senlin
	WANG Xiaohui
	LAN Xuguang

Cite this article:

CHEN Zhuoran,LIU Zeyang,WAN Lipeng等. A Review of Multi-agent Reinforcement Learning Theory and Applications[J]. Pattern Recognition and Artificial Intelligence, 2024, 37(10): 851-872.

URL:

http://manu46.magtech.com.cn/Jweb_prai/EN/10.16451/j.cnki.issn1003-6059.202410001 OR http://manu46.magtech.com.cn/Jweb_prai/EN/Y2024/V37/I10/851

[1] KAELBLING L P, LITTMAN M L, MOORE A W.Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research, 1996, 4(1): 237-285.
[2] BUŞONIU L, BABUŠKA R, DE SCHUTTER B. Multi-agent Reinforcement Learning: An Overview // SRINIVASAN D, JAIN L C, eds. Innovations in Multi-agent Systems and Applications-1. Berlin, Germany: Springer, 2010: 183-221.
[3] ARULKUMARAN K, DEISENROTH M P, BRUNDAGE M, et al. Deep Reinforcement Learning: A Brief Survey. IEEE Signal Processing Magazine, 2017, 34(6): 26-38.
[4] SILVER D, HUANG A, MADDISON C J, et al. Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature, 2016, 529: 484-489.
[5] VINYALS O, BABUSCHKIN I, CZARNECKI W M, et al. Grandmaster Level in StarCraft II Using Multi-agent Reinforcement Lear-ning. Nature, 2019, 575: 350-354.
[6] LOWE R, WU Y, TAMAR A, et al. Multi-agent Actor-Critic for Mixed Cooperative-Competitive Environments // Proc of the 31st International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2017: 6382-6393.
[7] WU J, XU X.Decentralised Grid Scheduling Approach Based on Multi-agent Reinforcement Learning and Gossip Mechanism. CAAI Transactions on Intelligence Technology, 2018, 3(1): 8-17.
[8] WEI H, CHEN C C, ZHENG G J, et al. PressLight: Learning Max Pressure Control to Coordinate Traffic Signals in Arterial Network // Proc of the 25th ACM SIGKDD International Conference on Know-ledge Discovery and Data Mining. New York, USA: ACM, 2019: 1290-1298.
[9] DELAURENTIS D A, PANCHAL J H, RAZ A K, et al. Toward Automated Game Balance: A Systematic Engineering Design Approach // Proc of the IEEE Conference on Games. Washington, USA: IEEE, 2021. DOI: 10.1109/CoG52621.2021.9619032.
[10] OLIEHOEK F A, AMATO C.A Concise Introduction to Decentra-lized POMDPs. Berlin, Germany: Springer, 2016.
[11] LEIBO J Z, ZAMBALDI V, LANCTOT M, et al. Multi-agent Reinforcement Learning in Sequential Social Dilemmas // Proc of the 16th Conference on Autonomous Agents and Multi-agent Systems. New York, USA: ACM, 2017: 464-473.
[12] HUGHES E, LEIBO J Z, PHILLIPS M, et al. Inequity Aversion Improves Cooperation in Intertemporal Social Dilemmas // Proc of the 32nd International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2018: 3330-3340.
[13] RASHID T, SAMVELYAN M, DE WITT C S, et al. Monotonic Value Function Factorisation for Deep Multi-agent Reinforcement Learning. Journal of Machine Learning Research, 2020, 21: 7234-7284.
[14] SUNEHAG P, LEVER G, GRUSLYS A, et al. Value-Decomposition Networks for Cooperative Multi-agent Learning // Proc of the 17th International Conference on Autonomous Agents and Multi-agent Systems. New York, USA: ACM, 2018: 2085-2087.
[15] FOERSTER J, FARQUHAR G, AFOURAS T, et al. Counterfactual Multi-agent Policy Gradients. Proceedings of the AAAI Confe-rence on Artificial Intelligence, 2018, 32(1): 2974-2982.
[16] LI J H, KUANG K, WANG B X, et al. Shapley Counterfactual Credits for Multi-agent Reinforcement Learning // Proc of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mi-ning. New York, USA: ACM, 2021: 934-942.
[17] DAS A, GERVET T, ROMOFF J, et al. TarMAC: Targeted Multi-agent Communication. Journal of Machine Learning Research, 2019, 97: 1538-1546.
[18] LIU Y, WANG W X, HU Y J, et al. Multi-agent Game Abstraction via Graph Attention Neural Network. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(5): 7211-7218.
[19] SUKHBAATAR S, SZLAM A, FERGUS R. Learning Multiagent Communication with Backpropagation // Proc of the 30th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2016: 2252-2260.
[20] AUER P.Using Confidence Bounds for Exploitation-Exploration Trade-Offs. Journal of Machine Learning Research, 2002, 3: 397-422.
[21] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-Level Control through Deep Reinforcement Learning. Nature, 2015, 518: 529-533.
[22] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous Control with Deep Reinforcement Learning[C/OL].[2024-09-21]. https://arxiv.org/pdf/1509.02971.
[23] WANG T H, WANG J H, WU Y, et al. Influence-Based Multi-agent Exploration[C/OL].[2024-09-21]. https://arxiv.org/pdf/1910.05512.
[24] ZHENG L L, CHEN J R, WANG J H, et al. Episodic Multi-agent Reinforcement Learning with Curiosity-Driven Exploration // Proc of the 30th International Conference on Neural Information Proce-ssing Systems. Cambridge, USA: MIT Press, 2021: 3757-3769.
[25] HERNANDEZ-LEAL P, KARTAL B, TAYLOR M E.A Survey and Critique of Multiagent Deep Reinforcement Learning. Autonomous Agents and Multi-agent Systems, 2019, 33(6): 750-797.
[26] WATKINS C J C H, DAYAN P. Q-Learning. Machine Learning, 1992, 8(3): 279-292.
[27] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing Atari with Deep Reinforcement Learning[C/OL].[2024-09-21]. https://arxiv.org/pdf/1312.5602.
[28] VAN HASSELT H, GUEZ A, SILVER D.Deep Reinforcement Learning with Double Q-Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 2016, 30(1): 2094-2100.
[29] WANG Z, SCHAUL T, HESSEL M, et al. Dueling Network Architectures for Deep Reinforcement Learning. Journal of Machine Learning Research, 2016, 48: 1995-2003.
[30] HAUSKNECHT M, STONE P. Deep Recurrent Q-Learning for Par-tially Observable MDPs // Proc of the AAAI Fall Symposium(Sequential Decision Making for Intelligent Agents). Palo Alto, USA: AAAI Press, 2015: 29-37.
[31] SCHULMAN J, LEVINE S, MORITZ P, et al. Trust Region Policy Optimization // Proc of the 32nd International Conference on Machine Learning. San Diego, USA: JMLR, 2015: 1889-1897.
[32] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal Policy Optimization Algorithms[C/OL].[2024-09-21]. https://arxiv.org/pdf/1707.06347.
[33] WILLIAMS R J.Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. Machine Learning, 1992, 8: 229-256.
[34] GU S X, LILLICRAP T, GHAHRAMANI Z, et al. Q-Prop: Sample-Efficient Policy Gradient with an Off-Policy Critic[C/OL].[2024-09-21]. https://arxiv.org/pdf/1611.02247.
[35] MNIH V, BADIA A P, MIRZA M, et al. Asynchronous Methods for Deep Reinforcement Learning. Journal of Machine Learning Research, 2016, 48: 1928-1937.
[36] BUSONIU L, BABUSKA R, DE SCHUTTER B.A Comprehensive Survey of Multiagent Reinforcement Learning. IEEE Transactions on Systems, Man, and Cybernetics(Applications and Reviews), 2008, 38(2): 156-172.
[37] LEONARDOS S, PILIOURAS G, SPENDLOVE K. Exploration-Exploitation in Multi-agent Competition: Convergence with Boun-ded Rationality // Proc of the 35th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2021: 26318-26331.
[38] PAPOUDAKIS G, CHRISTIANOS F, RAHMAN A, et al. Dealing with Non-stationarity in Multi-agent Deep Reinforcement Learning[C/OL].[2024-09-21]. https://arxiv.org/pdf/1906.04737.
[39] NG A Y, RUSSELL S J.Algorithms for Inverse Reinforcement Lear-ning // Proc of the 17th International Conference on Machine Learning. San Diego, USA: JMLR, 2000: 663-670.
[40] TAMPUU A, MATIISEN T, KODELJA D, et al. Multiagent Co-operation and Competition with Deep Reinforcement Learning. PloS One, 2017, 12(4). DOI: 10.1371/journal.pone.0172395.
[41] LERER A, PEYSAKHOVICH A.Maintaining Cooperation in Complex Social Dilemmas Using Deep Reinforcement Learning[C/OL]. [2024-09-21].https://arxiv.org/abs/1707.01068v4.
[42] BANSAL T, PACHOCKI J, SIDOR S, et al. Emergent Complexity via Multi-agent Competition[C/OL].[2024-09-21]. https://arxiv.org/pdf/1710.03748.
[43] GUPTA J K, EGOROV M, KOCHENDERFER M.Cooperative Multi-agent Control Using Deep Reinforcement Learning // Proc of the International Conference on Autonomous Agents and Multiagent Systems. Berlin, Germany: Springer, 2017: 66-83.
[44] DE WITT C S, GUPTA T, MAKOVIICHUK D, et al. Is Independent Learning All You Need in the Starcraft Multi-agent Cha-llenge?[C/OL]. [2024-09-21]. https://arxiv.org/pdf/2011.09533.
[45] YU C, VELU A, VINITSKY E, et al. The Surprising Effectiveness of PPO in Cooperative, Multi-agent Games[C/OL].[2024-09-21]. https://arxiv.org/pdf/2103.01955.
[46] LI C H, WANG T H, WU C J, et al. Celebrating Diversity in Shared Multi-agent Reinforcement Learning // Proc of the 35th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2021: 3991-4002.
[47] JIANG J C, LU Z Q.The Emergence of Individuality. Journal of Machine Learning Research, 2021, 139: 4992-5001.
[48] LIU S Y, ZHOU Y H, SONG J, et al. Contrastive Identity-Aware Learning for Multi-agent Value Decomposition. Proceedings of the AAAI Conference on Artificial Intelligence, 2023, 37(10): 11595-11603.
[49] WANG T H, GUPTA T, MAHAJAN A, et al. RODE: Learning Roles to Decompose Multi-agent Tasks[C/OL].[2024-09-21]. https://arxiv.org/pdf/2010.01523.
[50] LIU B, LIU Q, STONE P, et al. Coach-Player Multi-agent Reinforcement Learning for Dynamic Team Composition. Journal of Machine Learning Research, 2021, 139: 6860-6870.
[51] HU Z C, ZHANG Z Z, LI H X, et al. Attention-Guided Contrastive Role Representations for Multi-agent Reinforcement Learning[C/OL].[2024-09-21]. https://arxiv.org/pdf/2312.04819.
[52] ZHOU Y H, LIU S Y, QING Y P, et al. Is Centralized Training with Decentralized Execution Framework Centralized Enough for MARL?[C/OL]. [2024-09-21]. https://arxiv.org/pdf/2305.17352.
[53] WOLPERT D H, TUMER K.A Survey of Collective Intelligence[C/OL]. [2024-09-21]. https://ntrs.nasa.gov/api/citations/20000086233/downloads/20000086233.pdf.
[54] YANG Y D, HAO J Y, LIAO B, et al. Qatten: A General Framework for Cooperative Multiagent Reinforcement Learning[C/OL].[2024-09-21]. https://arxiv.org/pdf/2002.03939.
[55] YAO X H, WEN C, WANG Y H,et al. SMIX(λ): Enhancing Centralized Value Functions for Cooperative Multiagent Reinforcement Learning. IEEE Transactions on Neural Networks and Lear-ning Systems, 2023, 34(1): 52-63.
[56] YANG Y D, HAO J Y, CHEN G Y, et al. Q-Value Path Decomposition for Deep Multiagent Reinforcement Learning. Journal of Machine Learning Research, 2020, 119: 10706-10715.
[57] SON K, KIM D, KANG W J, et al. QTRAN: Learning to Facto-rize with Transformation for Cooperative Multi-agent Reinforcement Learning. Journal of Machine Learning Research, 2019, 97: 5887-5896.
[58] WANG J H, REN Z Z, LIU T, et al. QPLEX: Duplex Dueling Multi-agent Q-Learning[C/OL].[2024-09-21]. https://arxiv.org/pdf/2008.01062.
[59] RASHID T, FARQUHAR G, PENG B, et al. Weighted QMIX: Expanding Monotonic Value Function Factorisation for Deep Multi-agent Reinforcement Learning // Proc of the 34th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2020: 10199-10210.
[60] WAN L P, LIU Z Y, CHEN X Y, et al. Greedy-Based Value Re-presentation for Optimal Coordination in Multi-agent Reinforcement Learning. Journal of Machine Learning Research, 2022, 162: 22512-22535.
[61] AUER P, CESA-BIANCHI N, FISCHER P.Finite-Time Analysis of the Multiarmed Bandit Problem. Machine Learning, 2002, 47(2): 235-256.
[62] MAHAJAN A, RASHID T, SAMVELYAN M, et al. MAVEN: Multi-agent Variational Exploration // Proc of the 33rd Internatio-nal Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2019: 7613-7624.
[63] LIU I J, JAIN U, YEH R A, et al. Cooperative Exploration for Multi-agent Deep Reinforcement Learning. Journal of Machine Learning Research, 2021, 139: 6826-6836.
[64] GUPTA T, MAHAJAN A, PENG B, et al. UneVEn: Universal Value Exploration for Multi-agent Reinforcement Learning. Journal of Machine Learning Research, 2021, 139: 3930-3941.
[65] TANG Z G, YU C, CHEN B Y, et al. Discovering Diverse Multi-agent Strategic Behavior via Reward Randomization[C/OL].[2024-09-21]. https://arxiv.org/pdf/2103.04564.
[66] ECOFFET A, HUIZINGA J, LEHMAN J, et al. Go-Explore: A New Approach for Hard-Exploration Problems[C/OL].[2024-09-21]. https://arxiv.org/pdf/1901.10995.
[67] ECOFFET A, HUIZINGA J, LEHMAN J, et al. First Return, Then Explore. Nature, 2021, 590: 580-586.
[68] LIU Z Y, WAN L P, YANG X R, et al. Imagine, Initialize, and Explore: An Effective Exploration Method in Multi-agent Reinforcement Learning[C/OL].[2024-09-21]. https://arxiv.org/pdf/2402.17978.
[69] FOERSTER J N, ASSAEL Y M, DE FREITAS N, et al. Learning to Communicate with Deep Multi-agent Reinforcement Learning // Proc of the 30th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2016: 2145-2153.
[70] JIANG J C, LU Z Q.Learning Attentional Communication for Multi-agent Cooperation // Proc of the 32nd International Confe-rence on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2018: 7265-7275.
[71] SINGH A, JAIN T, SUKHBAATAR S.Learning When to Communicate at Scale in Multiagent Cooperative and Competitive Tasks[C/OL]. [2024-09-21]. https://arxiv.org/pdf/1812.09755.
[72] ZHANG S Q, ZHANG Q, LIN J Y.Efficient Communication in Multi-agent Reinforcement Learning via Variance Based Control // Proc of the 33rd International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2019: 3235-3244.
[73] WANG T H, WANG J H, ZHENG C Y, et al. Learning Nearly Decomposable Value Functions via Communication Minimization[C/OL].[2024-09-21]. https://arxiv.org/pdf/1910.05366.
[74] BATTAGLIA P W, HAMRICK J B, BAPST V, et al. Relational Inductive Biases, Deep Learning, and Graph Networks[C/OL].[2024-09-21]. https://arxiv.org/pdf/1806.01261.
[75] BÖHMER W, KURIN V, WHITESON S. Deep Coordination Graphs // Proc of the 37th International Conference on Machine Learning. San Diego, USA: JMLR,2020: 980-991.
[76] LI S, GUPTA J K, MORALES P, et al. Deep Implicit Coordination Graphs for Multi-agent Reinforcement Learning[C/OL].[2024-09-21]. https://arxiv.org/pdf/2006.11438.
[77] YANG Q L, DONG W J, REN Z Z, et al. Self-Organized Polynomial-Time Coordination Graphs. Journal of Machine Learning Research, 2022, 162: 24963-24979.
[78] SHI Y C, DUAN S H, XU C, et al. Dynamic Deep Factor Graph for Multi-agent Reinforcement Learning[C/OL].[2024-09-21]. https://arxiv.org/pdf/2405.05542.
[79] NIU Y R, PALEJA R R, GOMBOLAY M. Multi-agent Graph-Attention Communication and Teaming // Proc of the 20th International Conference on Autonomous Agents and Multi-agent Systems. New York, USA: ACM, 964-973.
[80] DU Y L, LIU B, MOENS V, et al. Learning Correlated Communication Topology in Multi-agent Reinforcement Learning // Proc of the 20th International Conference on Autonomous Agents and Multi-agent Systems. New York, USA: ACM, 2021: 456-464.
[81] LIU Z Y, WAN L P, SUI X, et al. Deep Hierarchical Communication Graph in Multi-agent Reinforcement Learning // Proc of the 32nd International Joint Conference on Artificial Intelligence. San Francisco, USA: IJCAI, 2023: 208-216.
[82] CHU T S, CHINCHALI S, KATTI S.Multi-agent Reinforcement Learning for Networked System Control[C/OL]. [2024-09-21]. https://arxiv.org/pdf/2004.01339.
[83] QU C, LI H, LIU C, et al. Intention Propagation for Multi-agent Reinforcement Learning[C/OL].[2024-09-21]. https://arxiv.org/pdf/2004.08883v2.
[84] KIM W, PARK J, SUNG Y C.Communication in Multi-agent Reinforcement Learning: Intention Sharing[C/OL]. [2024-09-21].https://openreview.net/pdf?id=qpsl2dR9twy.
[85] ZHU C X, DASTANI M, WANG S H.A Survey of Multi-agent Deep Reinforcement Learning with Communication[C/OL]. [2024-09-21].https://arxiv.org/pdf/2203.08975.
[86] KUBA J G, CHEN R Q, WEN M N, et al. Trust Region Policy Optimisation in Multi-agent Reinforcement Learning[C/OL].[2024-09-21]. https://arxiv.org/pdf/2109.11251.
[87] WANG H W, YU L T, CAO Z J, et al. Multi-agent Imitation Learning with Copulas // Proc of the European Conference on Machine Learning and Knowledge Discovery in Databases. Berlin, Germany: Springer, 2021: 139-156.
[88] ZHU T C, QIU Y, ZHOU H Y, et al. Decoding Global Prefe-rences: Temporal and Cooperative Dependency Modeling in Multi-agent Preference-Based Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(15): 17202-17210.
[89] KANG S, LEE Y, YUN S Y.DPM: Dual Preferences-Based Multi-agent Reinforcement Learning[C/OL]. [2024-09-21].https://openreview.net/pdf?id=TW3DIP2h5p.
[90] LIU Z Y, YANG X R, SUN S G, et al. Grounded Answers for Multi-agent Decision-Making Problem through Generative World Model[C/OL].[2024-09-21]. https://arxiv.org/pdf/2410.02664.
[91] MA H, HU T Y, PU Z Q, et al. Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-agent Reinforcement Learning[C/OL].[2024-09-21]. https://arxiv.org/pdf/2410.06101.
[92] SAMVELYAN M, RASHID T, DE WITT C S, et al. The StarCraft Multi-agent Challenge[C/OL].[2024-09-21]. https://arxiv.org/pdf/1902.04043
[93] KURACH K, RAICHUK A, STAŃCZYK P, et al. Google Research Football: A Novel Reinforcement Learning Environment. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(4): 4501-4510.
[94] BAKER B, KANITSCHEIDER I, MARKOV T, et al. Emergent Tool Use from Multi-agent Autocurricula[C/OL].[2024-09-21]. https://arxiv.org/pdf/1909.07528.
[95] BERNER C, BROCKMAN G, CHAN B, et al. Dota 2 with Large Scale Deep Reinforcement Learning[C/OL].[2024-09-21]. https://arxiv.org/pdf/1912.06680.
[96] IQBAL S, SHA F.Actor-Attention-Critic for Multi-agent Reinfor-cement Learning. Journal of Machine Learning Research, 2019, 97: 2961-2970.
[97] YANG Y D, LUO R, LI M N, et al. Mean Field Multi-agent Reinforcement Learning. Journal of Machine Learning Research, 2018, 80: 5571-5580.
[98] ZHANG K Q, YANG Z R, LIU H, et al. Fully Decentralized Multi-agent Reinforcement Learning with Networked Agents. Journal of Machine Learning Research, 2018, 80: 5872-5881.
[99] MAO H Y, LIU W L, HAO J Y, et al. Neighborhood Cognition Consistent Multi-agent Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(5): 7219-7226.
[100] MORRA L, MANIGRASSO F, CANTO G, et al. Slicing and Dicing Soccer: Automatic Detection of Complex Events from Spatio-Temporal Data // Proc of the International Conference on Image Analysis and Recognition. Berlin, Germany: Springer, 2020: 107-121.
[101] ROY J, BARDE P, HARVEY F G, et al. Promoting Coordination through Policy Regularization in Multi-agent Deep Reinforcement Learning // Proc of the 34th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2020: 15774-15785.
[102] DE WITT C A S, PENG B, KAMIENNY P A, et al. Deep Multi-agent Reinforcement Learning for Decentralized Continuous Co-operative Control[C/OL].[2024-09-21]. https://arxiv.org/pdf/2003.06709v4.
[103] ELLIS B, COOK J, MOALLA S, et al. SMACv2: An Improved Benchmark for Cooperative Multi-agent Reinforcement Learning // Proc of the 37th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2024: 37567-37593.
[104] CHUNG H M, MAHARJAN S, ZHANG Y, et al. Distributed Deep Reinforcement Learning for Intelligent Load Scheduling in Residential Smart Grids. IEEE Transactions on Industrial Informatics, 2021, 17(4): 2752-2763.
[105] CAO D, ZHAO J B, HU W H, et al. Data-Driven Multi-agent Deep Reinforcement Learning for Distribution System Decentra-lized Voltage Control with High Penetration of PVs. IEEE Tran-sactions on Smart Grid, 2021, 12(5): 4137-4150.
[106] ZHAO Y Z, LIU T, HILL D J.A Multi-agent Reinforcement Learning Based Frequency Control Method with Data-Enabled Predictive Control Guided Policy Search // Proc of the IEEE Power and Energy Society General Meeting. Washington, USA: IEEE, 2022. DOI: 10.1109/PESGM48719.2022.9917031.
[107] LIU Y, QU Z H, XIN H H, et al. Distributed Real-Time Optimal Power Flow Control in Smart Grid. IEEE Transactions on Power Systems, 2017, 32(5): 3403-3414.
[108] GAO Y, AI Q.Distributed Multi-agent Control for Combined AC/DC Grids with Wind Power Plant Clusters. IET Generation, Transmission and Distribution, 2018, 12(3): 670-677.
[109] RADHAKRISHNAN B M, SRINIVASAN D.A Multi-agent Based Distributed Energy Management Scheme for Smart Grid Applications. Energy, 2016, 103: 192-204.
[110] CHNITER H, LI Y T, KHALGUI M, et al. Multi-agent Adaptive Architecture for Flexible Distributed Real-Time Systems. IEEE Access, 2018, 6: 23152-23171.
[111] WANG J H, XU W K, GU Y J, et al. Multi-agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks // Proc of the 35th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2021: 3271-3284.
[112] ZHANG M, PAN C H.Hierarchical Optimization Scheduling Algorithm for Logistics Transport Vehicles Based on Multi-agent Reinforcement Learning. IEEE Transactions on Intelligent Transportation Systems, 2024, 25(3): 3108-3117.
[113] ZHANG L X, YANG C, YAN Y, et al. Distributed Real-Time Scheduling in Cloud Manufacturing by Deep Reinforcement Lear-ning. IEEE Transactions on Industrial Informatics, 2022, 18(12): 8999-9007.
[114] KRNJAIC A, STELEAC R D, THOMAS J D, et al. Scalable Multi-agent Reinforcement Learning for Warehouse Logistics with Robotic and Human Co-workers[C/OL].[2024-09-21]. https://arxiv.org/pdf/2212.11498.
[115] JO H, LEE H, JEON S, et al. Multi-agent Reinforcement Lear-ning-Based UAS Control for Logistics Environments // Proc of the Asia-Pacific International Symposium on Aerospace Technology. Berlin, Germany: Springer, 2021: 963-972.
[116] KHAYYAT M, AWASTHI A.An Intelligent Multi-agent Based Model for Collaborative Logistics Systems. Transportation Research Procedia, 2016, 12: 325-338.
[117] LI X H, ZHANG J, BIAN J, et al. A Cooperative Multi-agent Reinforcement Learning Framework for Resource Balancing in Com-plex Logistics Network // Proc of the 18th International Confe-rence on Autonomous Agents and Multi-agent Systems. New York, USA: ACM, 2019: 980-988.
[118] PRABUCHANDRAN K J, HEMANTH K N, SHALABH B.Multi-agent Reinforcement Learning for Traffic Signal Control // Proc of the 17th IEEE International Conference on Intelligent Transportation Systems. Washington, USA: IEEE, 2014: 2529-2534.
[119] GHANADBASHI S, GOLPAYEGANI F.Using Ontology to Guide Reinforcement Learning Agents in Unseen Situations: A Traffic Signal Control System Case Study. Applied Intelligence, 2022, 52(2): 1808-1824.
[120] NOAEEN M, NAIK A, GOODMAN L, et al. Reinforcement Lear-ning in Urban Network Traffic Signal Control: A Systematic Lite-rature Review. Expert Systems with Applications, 2022, 199.DOI: 10.1016/j.eswa.2022.116830.
[121] GE J I, OROSZ G.Dynamics of Connected Vehicle Systems with Delayed Acceleration Feedback. Transportation Research Part C(Emerging Technologies), 2014, 46: 46-64.
[122] WU C, KREIDIEH A, VINITSKY E, et al. Emergent Behaviors in Mixed-Autonomy Traffic // Proc of the 1st Annual Conference on Robot Learning. San Diego, USA: JMLR, 2017: 398-407.
[123] CHE A D, WANG Z L, ZHOU C H.Multi-agent Deep Reinforcement Learning for Recharging-Considered Vehicle Scheduling Problem in Container Terminals. IEEE Transactions on Intelligent Transportation Systems, 2024, 25(11): 16855-16868.
[124] MAO F, LI Z H, LIN Y L, et al. Mastering Arterial Traffic Signal Control with Multi-agent Attention-Based Soft Actor-Critic Model. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(3): 3129-3144.
[125] CHEN D, HAJIDAVALLOO M R, LI Z J, et al. Deep Multi-agent Reinforcement Learning for Highway On-Ramp Merging in Mixed Traffic. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(11): 11623-11638.
[126] WANG K, SHEN Z S, LEI Z, et al. Towards Multi-agent Reinforcement Learning Based Traffic Signal Control through Spatio-Temporal Hypergraphs[C/OL].[2024-09-21]. https://arxiv.org/pdf/2404.11014.
[127] VIDHATE D A, KULKARNI P.Cooperative Multi-agent Reinforcement Learning Models(CMRLM) for Intelligent Traffic Control // Proc of the 1st International Conference on Intelligent Systems and Information Management. Washington, USA: IEEE, 2017: 325-331.
[128] LOUATI A, LOUATI H, KARIRI E, et al. Sustainable Smart Cities through Multi-agent Reinforcement Learning-Based Coope-rative Autonomous Vehicles. Sustainability, 2024, 16(5).DOI: 10.3390/su16051779.
[129] WU C L, MA Z L, KIM I.Multi-agent Reinforcement Learning for Traffic Signal Control: Algorithms and Robustness Analysis // Proc of the IEEE 23rd International Conference on Intelligent Transportation Systems. Washington, USA: IEEE, 2020. DOI: 10.1109/ITSC45102.2020.9294623.
[130] ZEYNIVAND A, JAVADPOUR A, BOLOUKI S, et al. Traffic Flow Control Using Multi-agent Reinforcement Learning. Journal of Network and Computer Applications, 2022, 207. DOI: 10.1016/j.jnca.2022.103497.
[131] YANG S T.Hierarchical Graph Multi-agent Reinforcement Lear-ning for Traffic Signal Control. Information Sciences, 2023, 634: 55-72.
[132] WU T, ZHOU P, LIU K, et al. Multi-agent Deep Reinforcement Learning for Urban Traffic Light Control in Vehicular Networks. IEEE Transactions on Vehicular Technology, 2020, 69(8): 8243-8256.
[133] MUSHTAQ A, HAQ I U, SARWAR M A, et al. Multi-agent Reinforcement Learning for Traffic Flow Management of Autonomous Vehicles. Sensors, 2023, 23(5). DOI: 10.3390/s23052373.
[134] WANG C, ZHANG Q F, TIAN Q Y, et al. Learning Mobile Manipulation through Deep Reinforcement Learning. Sensors, 2020, 20(3). DOI: 10.3390/s20030939.
[135] TREMBLAY J, TO T, SUNDARALINGAM B, et al. Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects // Proc of the 2nd Conference on Robot Learning. San Diego, USA: JMLR, 2018: 306-316.
[136] PHAM H X, LA H M, FEIL-SEIFER D, et al. Cooperative and Distributed Reinforcement Learning of Drones for Field Coverage[C/OL].[2024-09-21]. https://arxiv.org/pdf/1803.07250v1.
[137] SARTORETTI G, WU Y, PAIVINE W, et al. Distributed Reinforcement Learning for Multi-robot Decentralized Collective Construction // Proc of the 14th International Symposium on Distributed Autonomous Robotic Systems. Berlin, Germany: Springer, 2019: 35-49.
[138] MA C D, LI A M, DU Y L, et al. Efficient and Scalable Reinforcement Learning for Large-Scale Network Control. Nature Machine Intelligence, 2024, 6: 1006-1020.
[139] ZHAO W S, QUERALTA J P, WESTERLUND T.Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: A Survey // Proc of the IEEE Symposium Series on Computational Intelligence. Washington, USA: IEEE, 2020: 737-744.