稀疏奖励场景下基于状态空间探索的多智能体强化学习算法

doi:10.16451/j.cnki.issn1003-6059.202405005

Abstract
Figure/Table
References
Related Citation (15)

Download: PDF (2799 KB) HTML (1 KB)
Export: BibTeX | EndNote (RIS)

Abstract In multi-agent task scenarios, a large and diverse state space is often encountered. In some cases, the reward information provided by the external environment may be extremely limited, exhibiting sparse reward characteristics. Most existing multi-agent reinforcement learning algorithms present limited effectiveness in such sparse reward scenarios, as relying only on accidentally discovered reward sequences leads to a slow and inefficient learning process. To address this issue, a multi-agent reinforcement learning algorithm based on state space exploration(MASSE) in sparse reward scenarios is proposed. MASSE constructs a subset space of states, maps one state from this subset, and takes it as an intrinsic goal, enabling agents to more fully utilize the state space and reduce unnecessary exploration. The agent states are decomposed into self-states and environmental states, and the intrinsic rewards based on mutual information are generated by combining these two types of states with intrinsic goals. By constructing a state subset space and generating intrinsic rewards based on mutual information, the states close to the target states and the states understanding the environment are rewarded appropriately. Consequently, agents are motivated to move more actively towards the goal while enhancing their understanding of the environment, guiding them to flexibly adapt to sparse reward scenarios. The experimental results indicate the performance of MASSE is superior in multi-agent collaborative scenarios with varying degrees of sparsity.

Key words： Reinforcement Learning Sparse Reward Mutual Information Intrinsic Rewards

Received: 07 April 2024

ZTFLH:

TP391

Fund:National Natural Science Foundation of China(No.61872327), Natural Science Foundation of Anhui Province(No.2308085MF203), Project of Collaborative Innovation in Anhui Colleges and Universities(No.GXXT-2022-055), Open Fund of Key Laboratory of Flight Techniques and Flight Safety of Civil Aviation Administration of China(No.FZ2020KF07)

Corresponding Authors: FANG Baofu, Ph.D., associate professor. His research interests include intelligent robot systems.

About author:: YU Tingting, Master student. Her research interests include multi-agent deep reinforcement learning. WANG Hao, Ph.D., professor. His research interests include distributed intelligent systems and robots. WANG Zaijun, Master, professor. Her research interests include multi-robot task allocation and artificial intelligence.

	Service

	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	FANG Baofu
	YU Tingting
	WANG Hao
	WANG Zaijun

Cite this article:

FANG Baofu,YU Tingting,WANG Hao等. Multi-agent Reinforcement Learning Algorithm Based on State Space Exploration in Sparse Reward Scenarios[J]. Pattern Recognition and Artificial Intelligence, 2024, 37(5): 435-446.

URL:

http://manu46.magtech.com.cn/Jweb_prai/EN/10.16451/j.cnki.issn1003-6059.202405005 OR http://manu46.magtech.com.cn/Jweb_prai/EN/Y2024/V37/I5/435