Emotion-Based Heterogeneous Multi-agent Reinforcement Learning with Sparse Reward
FANG Baofu1,2, MA Yunting1,2, WANG Zaijun3, WANG Hao1,2
1. School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601 2. Anhui Province Key Laboratory of Affective Computing and Advanced Intelligent Machine, Hefei University of Technology, Hefei, 230601 3. Key Laboratory of Flight Techniques and Flight Safety, Civil Aviation Flight University of China, Guanghan 618307
Abstract:In reinforcement learning, the convergence speed and efficiency of the agent are greatly reduced due to its inability to acquire effective experience in an sparse reward distribution environment. Aiming at this kind of sparse reward problem, a method of emotion-based heterogeneous multi-agent reinforcement learning with sparse reward is proposed in this paper. Firstly, the emotion model based on personality is established to provide incentive mechanism for multiple heterogeneous agents as an effective supplement to external rewards. Then, based on this mechanism, a deep deterministic strategy gradient reinforcement learning algorithm based on intrinsic emotional incentive mechanism under sparse rewards is proposed to accelerate the convergence speed of agents. Finally, multi-robot pursuit is used as a simulation experiment platform to construct sparse reward scenarios with different difficulty levels, and the effectiveness and superiority of the proposed method in pursuit success rate and convergence speed are verified.
[1] MENDONCA M R F, ZIVIANI A, BARRETO A M S. Graph-Based Skill Acquisition for Reinforcement Learning. ACM Computing Surveys, 2019, 52(1): 6:1-6:26. [2] CASSANO L, ALGHUNAIM S A, SAYED A H. Team Policy Lear-ning for Multiagent Reinforcement Learning // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Washington, USA: IEEE, 2019: 3062-3066. [3] LOWE R, WU Y, TAMAR A, et al. Multi-agent Actor-Critic for Mixed Cooperative-Competitive Environments // Proc of the 31st International Conference on Neural Information Processing Systems. Cambridge, USA: The MIT Press, 2017: 6379-6390. [4] FOERSTER J N, FARQUHAR G, AFOURAS T, et al. Counterfactual Multi-Agent Policy Gradients // Proc of the 32nd AAAI Confe-rence on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2018: 2974-2982. [5] TANG H, HOUTHOOFT R, FOOTE D, et al. #exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning[C/OL]. [2020-10-20]. https://openreview.net/pdf?id=SyOvg6jxx. [6] KIM H, KIM J, JEONG Y, et al. EMI: Exploration with Mutual Information // Proc of the 36th International Conference on Machine Learning. Berlin, Germany: Springer, 2019: 3360-3369. [7] JAQUES N, LAZARIDOU A, HUGHES E, et al. Social Influence as Intrinsic Motivation for Multi-agent Deep Reinforcement Learning[C/OL]. [2020-10-20]. https://arxiv.org/pdf/1810.08647.pdf. [8] STROUSE D J, KLEIMAN-WEINER M, TENENBAUM J, et al. Learning to Share and Hide Intentions Using Information Regularization // Proc of the 32nd International Conference on Neural Information Processing Systems. Cambridge, USA: The MIT Press, 2018: 10270-10281. [9] ALLBECK J, BADLER N. Toward Representing Agent Behaviors Modified by Personality and Emotion[C/OL]. [2020-10-20]. https://vhml.org/workshops/AAMAS/papers/allbeck.pdf. [10] 于腾旭,刘 文,刘 方.基于强化学习视角的情绪调节研究及展望.心理技术与应用, 2019, 7(3): 183-192. (YU T X, LIU W, LIU F. Emotion Regulation Based on the Perspective of Reinforcement Learning and Future Prospects. Psycho-logy: Techniques and Applications, 2019, 7(3): 183-192. [11] PATHAK Y, AGRWAL P, EFROS A, et al. Curiosity-Driven Exploration by Self-supervised Prediction // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 488-489. [12] HOUTHOOFT R, CHEN X, DUAN Y, et al. VIME: Variational Information Maximizing Exploration[C/OL]. [2020-10-20]. https: //arxiv.org/pdf/1605.09674.pdf. [13] SEQUEIRA P, MELO F S, PAIVA A. Emergence of Emotional Appraisal Signals in Reinforcement Learning Agents. Autonomous Agents and Multi Agent Systems, 2015, 29: 537-568. [14] HUGHES E, LEIBO J Z, PHILLIPS M, et al. Inequity Aversion Improves Cooperation in Intertemporal Social Dilemmas // Proc of the 32nd International Conference on Neural Information Processing Systems. Cambridge, USA: The MIT Press, 2018: 3330-3340. [15] SEQUEIRA P, MELO F S, PRADA R, et al. Emerging Social Awareness: Exploring Intrinsic Motivation in Multiagent Learning // Proc of the IEEE International Conference on Development and Lear-ning. Washington, USA: IEEE, 2011. DOI: 10.1109/DEVLRN.2011.6037325. [16] SINGH S, LEWIS R L, BARTO A G, et al. Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective. IEEE Transactions on Autonomous Mental Development, 2010, 2(2): 70-82. [17] BROEKENS J, CHETOUANI M. Towards Transparent Robot Lear-ning through TDRL-Based Emotional Expression. IEEE Transac-tions on Affective Computing, 2019. DOI: 10.1109/TAFFC.2019. 2893348. [18] DURUPINAR F, PELECHANO N, ALLBECK J, et al. How the Ocean Personality Model Affects the Perception of Crowds. IEEE Computer Graphics and Applications, 2011, 31(3): 22-31. [19] 李海芳,何海鹏,陈俊杰.性格、心情和情感的多层情感建模方法.计算机辅助设计与图形学学报, 2011, 23(4): 725-730. (LI H F, HE H P, CHEN J J.A Multi-layer Affective Model Based on Personality, Mood and Emotion. Journal of Computer Aided Design and Graphics, 2011, 23(4): 725-730. [20] FANG B F, GUO X P, WANG Z J, et al. Collaborative Task Assignment of Interconnected Affective Robots towards Autonomous Healthcare Assistant[C/OL]. [2020-10-20]. https://me.net.nz/files/publications/lillicrap16.pdf. [21] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous Control with Deep Reinforcement Learning // Proc of the International Conference on Learning Representations. New York, USA: ACM, 2016: 390-400.