|
|
Multi-agent Cooperation Algorithm Based on Individual Gap Emotion in Sparse Reward Scenarios |
WANG Hao1,2, WANG Jing1,2, FANG Baofu1,2 |
1. School of Computer Science and Information Engineering,Hefei University of Technology, Hefei 230601; 2. Anhui Province Key Laboratory of Affective Computing and Advanced Intelligent Machine, Hefei University of Technology, Hefei 230601 |
|
|
Abstract To address the sparse reward problem confronted by reinforcement learning in multi-agent environment, a multi-agent cooperation algorithm based on individual gap emotion is proposed grounded on the role of emotions in human learning and decision making. The approximate joint action value function is optimized end-to-end to train individual policy, and the individual action value function of each agent is taken as an evaluation of the event. A gap emotion is generated via the gap between the predicted evaluation and the actual situation. The gap emotion model is regarded as an intrinsic motivation mechanism to generate an intrinsic emotion reward for each agent as an effective supplement to the extrinsic reward. Thus, the problem of sparse extrinsic rewards is alleviated. Moreover, the intrinsic emotional reward is task-independent and consequently it possesses some generality. The effectiveness and robustness of the proposed algorithm are verified in a multi-agent pursuit scenario with different sparsity levels.
|
Received: 06 September 2021
|
|
Fund:National Natural Science Foundation of China(No.61872327), Open Fund of Key Laboratory of Flight Techniques and Flight Safety of CAAC(No.FZ2020KF07) |
Corresponding Authors:
FANG Baofu, Ph.D., associate professor. His research interests include multirobot/agent systems, emotion agent and reinforcement learning.
|
About author:: WANG Hao, Ph.D., professor. His research interests include artificial intelligence and robots. WANG Jing, master student. His research interests include multi-agent reinforcement learning and emotion agent. |
|
|
|
[1] MA J M, WU F. Feudal Multi-agent Deep Reinforcement Learning for Traffic Signal Control // Proc of the 19th International Confe-rence on Autonomous Agents and Multiagent Systems. New York, USA: ACM, 2020: 816-824. [2] KOBER J, PETERS J. Reinforcement Learning in Robotics: A Survey // WIERING M, VAN DTTERLO M, eds. Reinforcement Learning: Start-of-the-Art. Berlin, Germany: Springer, 2012: 519-610. [3] VINYALS O, BABUSCHKIN I, CZARNECKI W M, et al. Grandmaster Level in StarCraft II Using Multi-agent Reinforcement Lear-ning. Nature, 2019, 575: 350-354. [4] 孙长银,穆朝絮.多智能体深度强化学习的若干关键科学问题.自动化学报, 2020, 46(7): 1301-1312. (SUN C Y, MU C X. Important Scientific Problems of Multi-agent Deep Reinforcement Learning. Acta Automatica Sinica, 2020, 46(7): 1301-1312.) [5] LI Y X.Deep Reinforcement Learning: An Overview[C/OL]. [2021-09-18].https://arxiv.org/pdf/1701.07274.pdf. [6] RIEDMILLER M, HAFNER R, LAMPE T, et al. Learning by Playing-Solving Sparse Reward Tasks from Scratch // Proc of the 35th International Conference on Machine Learning. San Diego, USA: JMLR, 2018: 4344-4353. [7] HUSSEIN A, GABER M M, ELYAN E, et al. Imitation Learning: A Survey of Learning Methods. ACM Computing Surveys, 2018, 50(2): 1-35. [8] PATHAK D, AGRAWAL P, EFROS A A, et al. Curiosity-Driven Exploration by Self-Supervised Prediction // Proc of the 34th International Conference on Machine Learning. San Diego, USA: JMLR, 2017: 488-499. [9] STROUSE D J, KLEIMAN-WEINER M, TENENBAUM J, et al. Learning to Share and Hide Intentions Using Information Regularization // Proc of the 32nd International Conference on Neural Information Processing Systems. Cambridge, USA: The MIT Press, 2018: 10270-10281. [10] BROEKENS J, CHETOUANI M. Towards Transparent Robot Lear-ning through TDRL-Based Emotional Expressions. IEEE Transactions on Affective Computing, 2019, 12(2): 352-362. [11] BAUMEISTER R F, VOHS K D, NATHAN DEWALL C, et al. How Emotion Shapes Behavior: Feedback, Anticipation, and Reflection, Rather Than Direct Causation. Personality and Social Psychology Review, 2007, 11(2): 167-203. [12] SON K, KIM D, KANG W J, et al. QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning[C/OL].[2021-08-21]. https://arxiv.org/pdf/1905.05408v1.pdf. [13] LOWE R, WU Y, TAMAR A, et al. Multi-agent Actor-Critic for Mixed Cooperative-Competitive Environments // Proc of the 31st International Conference on Neural Information Processing Systems. Cambridge, USA: The MIT Press, 2017: 6380-6391. [14] ZHENG Z Y, OH J, HESSEL M, et al. What Can Learned Intrinsic Rewards Capture? // Proc of the 37th International Conference on Machine Learning. San Diego, USA: JMLR, 2020: 11436-11446. [15] TANG H R, HOUTHOOFT R, FOOTE D, et al. #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning // Proc of the 31st International Conference on Neural Information Processing Systems. Cambridge, USA: The MIT Press, 2017: 2750-2759. [16] BADIA A P, SPRECHMANN P, VITVITSKYI A, et al. Never Give Up: Learning Directed Exploration Strategies[C/OL].[2021-09-18]. https://arxiv.org/pdf/2002.06038.pdf. [17] HORIO K, MATSUDA A. The Decision Making Method Added Emotion Judgment // Proc of the World Automation Congress. Washington, USA: IEEE, 2010: 1-6. [18] SALICHS M A, MALFAZ M. Using Emotions on Autonomous Agents. The Role of Happiness, Sadness and Fear[C/OL]. [2021-09-18]. http://roboticslab.uc3m.es/publications/Salichs06.PDF. [19] 方宝富,马云婷,王在俊,等.稀疏奖励下基于情感的异构多智能体强化学习.模式识别与人工智能, 2021, 34(3): 223-231. (FANG B F, MA Y T, WANG Z J, et al. Emotion-Based Heterogeneous Multi-agent Reinforcement Learning with Sparse Reward. Pattern Recognition and Artificial Intelligence, 2021, 34(3): 223-231.) [20] BROEKENS J. A Temporal Difference Reinforcement Learning Theory of Emotion: Unifying Emotion, Cognition and Adaptive Behavior[C/OL]. [2021-09-18].https://arxiv.org/ftp/arxiv/papers/1807/1807.08941.pdf. [21] MOORS A, ELLSWORTH P C, SCHERER K R, et al. Appraisal Theories of Emotion: State of the Art and Future Development. Emotion Review, 2013, 5(2): 119-124. [22] MOERLAND T M, BROEKENS J, JONKER C M. Emotion in Reinforcement Learning Agents and Robots: A Survey. Machine Learning, 2018, 107: 443-480. [23] PENG B, RASHID T, DE WITT C A S, et al. FACMAC: Factored Multi-agent Centralised Policy Gradients[C/OL].[2021-09-18]. https://arxiv.org/pdf/2003.06709.pdf. [24] HUGHES E, LEIBO J Z, PHILLIPS M G, et al. Inequity Aversion Improves Cooperation in Intertemporal Social Dilemmas[C/OL].[2021-09-18]. https://arxiv.org/pdf/1803.08884.pdf. [25] JIANG J C, LU Z Q.The Emergence of Individuality[C/OL]. [2021-09-18].https://arxiv.org/pdf/2006.05842.pdf. [26] KINGMA D P, BA J L.Adam: A Method for Stochastic Optimization[C/OL]. [2021-09-18].https://arxiv.org/pdf/1412.6980v3.pdf. |
|
|
|