Abstract:The multi-agent reinforcement learning algorithm is difficult to adapt to dynamically changing environments of agent scale. Aiming at this problem, a sequence to sequence multi-agent reinforcement learning algorithm(SMARL) based on sequential learning and block structure is proposed. The control network of an agent is divided into action network and target network based on deep deterministic policy gradient structure and sequence-to-sequence structure, respectively, and the correlation between algorithm structure and agent scale is removed. Inputs and outputs of the algorithm are also processed to break the correlation between algorithm policy and agent scale. Agents in SMARL can quickly adapt to the new environment, take different roles in task and achieve fast learning. Experiments show that the adaptability, performance and training efficiency of the proposed algorithm are superior to baseline algorithms.
[1] SHOHAM Y, POWERS R, GRENAGER T. Multi-agent Reinforcement Learning: A Critical Survey[C/OL]. [2020-09-25]. https://www.cc.gatech.edu/classes/AY2008/cs7641_spring/handouts/MALearning_ACriticalSurvey_2003_0516.pdf. [2] VINYALS O, BABUSCHKIN I, CZARNECKI W M, et al. Grandmaster Level in StarCraft II Using Multi-agent Reinforcement Lear-ning. Nature, 2019, 575(7782): 350-354. [3] MOHSENI-KABIR A, ISELE D, FUJIMURA K. Interaction-Aware Multi-agent Reinforcement Learning for Mobile Agents with Indivi-dual Goals // Proc of the International Conference on Robotics and Automation. Washington, USA: IEEE, 2019: 3370-3376. [4] ZHANG H C, FENG S Y, LIU C, et al. Cityflow: A Multi-agent Reinforcement Learning Environment for Large Scale City Traffic Scenario // Proc of the World Wide Web Conference. Berlin, Germany: Springer, 2019: 3620-3624. [5] LOWE R, WU Y, TAMAR A, et al. Multi-agent Actor-Critic for Mixed Cooperative-Competitive Environments // Proc of the 31st International Conference on Neural Information Processing Systems. Cambridge, USA: The MIT Press, 2017: 6382-6393. [6] FOERSTER J N, FARQUHAR G, AFOURAS T, et al. Counterfactual Multi-agent Policy Gradients[C/OL]. [2020-09-25]. https://arxiv.org/pdf/1705.08926.pdf. [7] WEI E, WICKE D, FREELAN D, et al. Multiagent Soft Q-Learning[C/OL]. [2020-09-25]. https://arxiv.org/pdf/1804.09817v1.pdf. [8] BRYS T, HARUTYUNYAN A, TAYLOR M E, et al. Policy Transfer Using Reward Shaping // Proc of the International Conference on Autonomous Agents and Multiagent Systems. New York, USA: ACM, 2015: 181-188. [9] TAYLOR A, DUPARIC I, GALVÁN-LÓPEZ E, et al. Transfer Learning in Multi-agent Systems through Parallel Transfer[C/OL]. [2020-09-25]. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.721.6452&rep=rep1&type=pdf. [10] MNIH V, BADIA A P, MIRZA M, et al. Asynchronous Methods for Deep Reinforcement Learning // Proc of the 33rd International Conference on Machine Learning. New York, USA: ACM, 2016: 1928-1937. [11] KHAN A, ZHANG C, LEE D D, et al. Scalable Centralized Deep Multi-agent Reinforcement Learning via Policy Gradients[C/OL]. [2020-09-25]. https://arxiv.org/pdf/1805.08776.pdf. [12] ZHANG J, PAN Y Z, YANG H T, et al. Scalable Deep Multi-agent Reinforcement Learning via Observation Embedding and Parameter Noise. IEEE Access, 2019, 7: 54615-54622. [13] LONG Q, ZHOU Z H, GUPTA A, et al. Evolutionary Population Curriculum for Scaling Multi-agent Reinforcement Learning[C/OL]. [2020-09-25]. https://arxiv.org/pdf/2003.10423.pdf. [14] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous Control with Deep Reinforcement Learning[C/OL]. [2020-09-25]. https://arxiv.org/pdf/1509.02971v2.pdf. [15] SUTSKEVER I, VINYALS O, LE Q V. Sequence to Sequence Learning with Neural Networks // Proc of the 27th International Conference on Neural Information Processing Systems. Cambridge, USA: The MIT Press, 2014: 3104-3112. [16] CHO K, VAN MERRIËNBOER B, GULCEHRE C, et al. Lear-ning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation // Proc of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: ACL, 2014: 1724-1734. [17] LUONG T, PHAM H, MANNING C D. Effective Approaches to Attention-Based Neural Machine Translation // Proc of the Confe-rence on Empirical Methods in Natural Language Processing. Stroudsburg, USA: ACL, 2015: 1412-1421.