模式识别与人工智能
2025年4月11日 星期五   首 页     期刊简介     编委会     投稿指南     伦理声明     联系我们                                                                English
模式识别与人工智能  2024, Vol. 37 Issue (5): 435-446    DOI: 10.16451/j.cnki.issn1003-6059.202405005
论文与报告 最新目录| 下期目录| 过刊浏览| 高级检索 |
稀疏奖励场景下基于状态空间探索的多智能体强化学习算法
方宝富1,2, 余婷婷1,2, 王浩1,2, 王在俊3
1.合肥工业大学 计算机与信息学院 合肥 230601;
2.合肥工业大学 情感计算与先进智能机器安徽省重点实验室 合肥 230601;
3.中国民用航空飞行学院 民航飞行技术与飞行安全重点实验室 广汉 618307
Multi-agent Reinforcement Learning Algorithm Based on State Space Exploration in Sparse Reward Scenarios
FANG Baofu1,2, YU Tingting1,2, WANG Hao1,2, WANG Zaijun3
1. School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601;
2. Anhui Province Key Laboratory of Affective Computing and Advanced Intelligent Machine, Hefei University of Technology,Hefei 230601;
3. Key Laboratory of Flight Techniques and Flight Safety, Civil Aviation Flight University of China, Guanghan 618307

全文: PDF (2799 KB)   HTML (1 KB) 
输出: BibTeX | EndNote (RIS)      
摘要 多智能体的任务场景往往伴随着庞大、多样的状态空间,而且在某些情况下,外部环境提供的奖励信息可能非常有限,呈现出稀疏奖励的特征.现有的大部分多智能体强化学习算法在此类稀疏奖励场景下效果有限,因为算法仅依赖于偶然发现的奖励序列,会导致学习过程缓慢和低效.为了解决这一问题,文中提出基于状态空间探索的多智能体强化学习算法,构建状态子集空间,从中映射出一个状态,并将其作为内在目标,使智能体更充分利用状态空间并减少不必要的探索.将智能体状态分解成自身状态与环境状态,结合这两类状态与内在目标,生成基于互信息的内在奖励.构建状态子集空间和基于互信息的内在奖励,对接近目标状态的状态与理解环境的状态给予适当的奖励,以激励智能体更积极地朝着目标前进,同时增强对环境的理解,从而引导其灵活适应稀疏奖励场景.在稀疏程度不同的多智能体协作场景中的实验验证文中算法性能较优.
服务
把本文推荐给朋友
加入我的书架
加入引用管理器
E-mail Alert
RSS
作者相关文章
方宝富
余婷婷
王浩
王在俊
关键词 强化学习稀疏奖励互信息内在奖励    
Abstract:In multi-agent task scenarios, a large and diverse state space is often encountered. In some cases, the reward information provided by the external environment may be extremely limited, exhibiting sparse reward characteristics. Most existing multi-agent reinforcement learning algorithms present limited effectiveness in such sparse reward scenarios, as relying only on accidentally discovered reward sequences leads to a slow and inefficient learning process. To address this issue, a multi-agent reinforcement learning algorithm based on state space exploration(MASSE) in sparse reward scenarios is proposed. MASSE constructs a subset space of states, maps one state from this subset, and takes it as an intrinsic goal, enabling agents to more fully utilize the state space and reduce unnecessary exploration. The agent states are decomposed into self-states and environmental states, and the intrinsic rewards based on mutual information are generated by combining these two types of states with intrinsic goals. By constructing a state subset space and generating intrinsic rewards based on mutual information, the states close to the target states and the states understanding the environment are rewarded appropriately. Consequently, agents are motivated to move more actively towards the goal while enhancing their understanding of the environment, guiding them to flexibly adapt to sparse reward scenarios. The experimental results indicate the performance of MASSE is superior in multi-agent collaborative scenarios with varying degrees of sparsity.
Key wordsReinforcement Learning    Sparse Reward    Mutual Information    Intrinsic Rewards   
收稿日期: 2024-04-07     
ZTFLH: TP391  
基金资助:国家自然科学基金项目(No.61872327)、安徽省自然科学基金项目(No.2308085MF203)、安徽高校协同创新项目(No.GXXT-2022-055)、民航飞行技术与飞行安全重点实验室开放基金项目(No.FZ2022KF09)资助
通讯作者: 方宝富,博士,副教授,主要研究方向为智能机器人系统.E-mail:fangbf@hfut.edu.cn.   
作者简介: 余婷婷,硕士研究生,主要研究方向为多智能体深度强化学习.E-mail:185137760@qq.com.王 浩,博士,教授,主要研究方向为分布式智能系统、机器人.E-mail:jsjxwangh@hfut.edu.cn.王在俊,硕士,研究员,主要研究方向为多机器人任务分配、人工智能.E-mail:tiantian20030315@126.com.
引用本文:   
方宝富, 余婷婷, 王浩, 王在俊. 稀疏奖励场景下基于状态空间探索的多智能体强化学习算法[J]. 模式识别与人工智能, 2024, 37(5): 435-446. FANG Baofu, YU Tingting, WANG Hao, WANG Zaijun. Multi-agent Reinforcement Learning Algorithm Based on State Space Exploration in Sparse Reward Scenarios. Pattern Recognition and Artificial Intelligence, 2024, 37(5): 435-446.
链接本文:  
http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.202405005      或     http://manu46.magtech.com.cn/Jweb_prai/CN/Y2024/V37/I5/435
版权所有 © 《模式识别与人工智能》编辑部
地址:安微省合肥市蜀山湖路350号 电话:0551-65591176 传真:0551-65591176 Email:bjb@iim.ac.cn
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn