模式识别与人工智能
2025年4月11日 星期五   首 页     期刊简介     编委会     投稿指南     伦理声明     联系我们                                                                English
模式识别与人工智能  2022, Vol. 35 Issue (5): 451-460    DOI: 10.16451/j.cnki.issn1003-6059.202205006
研究与应用 最新目录| 下期目录| 过刊浏览| 高级检索 |
稀疏奖励场景下基于个体落差情绪的多智能体协作算法
王浩1,2, 汪京1,2, 方宝富1,2
1.合肥工业大学 计算机与信息学院 合肥230601;
2.合肥工业大学 情感计算与先进智能机器安徽省重点实验室 合肥 230601
Multi-agent Cooperation Algorithm Based on Individual Gap Emotion in Sparse Reward Scenarios
WANG Hao1,2, WANG Jing1,2, FANG Baofu1,2
1. School of Computer Science and Information Engineering,Hefei University of Technology, Hefei 230601;
2. Anhui Province Key Laboratory of Affective Computing and Advanced Intelligent Machine, Hefei University of Technology, Hefei 230601

全文: PDF (2129 KB)   HTML (1 KB) 
输出: BibTeX | EndNote (RIS)      
摘要 针对在多智能体环境中强化学习面临的稀疏奖励问题,借鉴情绪在人类学习和决策中的作用,文中提出基于个体落差情绪的多智能体协作算法.对近似联合动作值函数进行端到端优化以训练个体策略,将每个智能体的个体动作值函数作为对事件的评估.预测评价与实际情况的差距产生落差情绪,以该落差情绪模型作为内在动机机制,为每个智能体产生一个内在情绪奖励,作为外在奖励的有效补充,以此缓解外在奖励稀疏的问题.同时内在情绪奖励与具体任务无关,因此具有一定的通用性.在不同稀疏程度的多智能体追捕场景中验证文中算法的有效性和鲁棒性.
服务
把本文推荐给朋友
加入我的书架
加入引用管理器
E-mail Alert
RSS
作者相关文章
王浩
汪京
方宝富
关键词 稀疏奖励多智能体协作强化学习个体落差情绪内在情绪奖励    
Abstract:To address the sparse reward problem confronted by reinforcement learning in multi-agent environment, a multi-agent cooperation algorithm based on individual gap emotion is proposed grounded on the role of emotions in human learning and decision making. The approximate joint action value function is optimized end-to-end to train individual policy, and the individual action value function of each agent is taken as an evaluation of the event. A gap emotion is generated via the gap between the predicted evaluation and the actual situation. The gap emotion model is regarded as an intrinsic motivation mechanism to generate an intrinsic emotion reward for each agent as an effective supplement to the extrinsic reward. Thus, the problem of sparse extrinsic rewards is alleviated. Moreover, the intrinsic emotional reward is task-independent and consequently it possesses some generality. The effectiveness and robustness of the proposed algorithm are verified in a multi-agent pursuit scenario with different sparsity levels.
Key wordsSparse Reward    Multi-agent Cooperation    Reinforcement Learning    Individual Gap Emotion    Intrinsic Emotional Reward   
收稿日期: 2021-09-06     
ZTFLH: TP181  
基金资助:国家自然科学基金项目(No.61872327)、民航飞行技术与飞行安全重点实验室开放基金项目(No.FZ2020KF07)资助
通讯作者: 方宝富,博士,副教授,主要研究方向为多机器人/ 智能体系统、情感智能体、强化学习.E-mail:fangbf@hfut.edu.cn.   
作者简介: 王 浩,博士,教授,主要研究方向为人工智能、机器人.E-mail:jsjxwangh@hfut.edu.cn.
汪 京,硕士研究生,主要研究方向为多智能体强化学习、情感智能体.E-mail:wangj@mail.hfut.edu.cn.
引用本文:   
王浩, 汪京, 方宝富. 稀疏奖励场景下基于个体落差情绪的多智能体协作算法[J]. 模式识别与人工智能, 2022, 35(5): 451-460. WANG Hao, WANG Jing, FANG Baofu. Multi-agent Cooperation Algorithm Based on Individual Gap Emotion in Sparse Reward Scenarios. Pattern Recognition and Artificial Intelligence, 2022, 35(5): 451-460.
链接本文:  
http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.202205006      或     http://manu46.magtech.com.cn/Jweb_prai/CN/Y2022/V35/I5/451
版权所有 © 《模式识别与人工智能》编辑部
地址:安微省合肥市蜀山湖路350号 电话:0551-65591176 传真:0551-65591176 Email:bjb@iim.ac.cn
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn