稀疏奖励场景下基于个体落差情绪的多智能体协作算法

doi:10.16451/j.cnki.issn1003-6059.202205006

摘要
图/表
参考文献
相关文章 (15)

全文: PDF (2129 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要针对在多智能体环境中强化学习面临的稀疏奖励问题,借鉴情绪在人类学习和决策中的作用,文中提出基于个体落差情绪的多智能体协作算法.对近似联合动作值函数进行端到端优化以训练个体策略,将每个智能体的个体动作值函数作为对事件的评估.预测评价与实际情况的差距产生落差情绪,以该落差情绪模型作为内在动机机制,为每个智能体产生一个内在情绪奖励,作为外在奖励的有效补充,以此缓解外在奖励稀疏的问题.同时内在情绪奖励与具体任务无关,因此具有一定的通用性.在不同稀疏程度的多智能体追捕场景中验证文中算法的有效性和鲁棒性.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	王浩
	汪京
	方宝富

关键词 ：稀疏奖励, 多智能体协作, 强化学习, 个体落差情绪, 内在情绪奖励

Abstract：To address the sparse reward problem confronted by reinforcement learning in multi-agent environment, a multi-agent cooperation algorithm based on individual gap emotion is proposed grounded on the role of emotions in human learning and decision making. The approximate joint action value function is optimized end-to-end to train individual policy, and the individual action value function of each agent is taken as an evaluation of the event. A gap emotion is generated via the gap between the predicted evaluation and the actual situation. The gap emotion model is regarded as an intrinsic motivation mechanism to generate an intrinsic emotion reward for each agent as an effective supplement to the extrinsic reward. Thus, the problem of sparse extrinsic rewards is alleviated. Moreover, the intrinsic emotional reward is task-independent and consequently it possesses some generality. The effectiveness and robustness of the proposed algorithm are verified in a multi-agent pursuit scenario with different sparsity levels.

Key words： Sparse Reward Multi-agent Cooperation Reinforcement Learning Individual Gap Emotion Intrinsic Emotional Reward

收稿日期: 2021-09-06

ZTFLH:

TP181

基金资助:国家自然科学基金项目(No.61872327)、民航飞行技术与飞行安全重点实验室开放基金项目(No.FZ2020KF07)资助

通讯作者: 方宝富,博士,副教授,主要研究方向为多机器人/ 智能体系统、情感智能体、强化学习.E-mail:fangbf@hfut.edu.cn.

作者简介: 王浩,博士,教授,主要研究方向为人工智能、机器人.E-mail:jsjxwangh@hfut.edu.cn.
汪京,硕士研究生,主要研究方向为多智能体强化学习、情感智能体.E-mail:wangj@mail.hfut.edu.cn.

引用本文:

王浩, 汪京, 方宝富. 稀疏奖励场景下基于个体落差情绪的多智能体协作算法[J]. 模式识别与人工智能, 2022, 35(5): 451-460. WANG Hao, WANG Jing, FANG Baofu. Multi-agent Cooperation Algorithm Based on Individual Gap Emotion in Sparse Reward Scenarios. Pattern Recognition and Artificial Intelligence, 2022, 35(5): 451-460.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.202205006 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2022/V35/I5/451