序列多智能体强化学习算法

doi:10.16451/j.cnki.issn1003-6059.202103002

摘要
图/表
参考文献
相关文章 (15)

全文: PDF (718 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要针对当前多智能体强化学习算法难以适应智能体规模动态变化的问题,文中提出序列多智能体强化学习算法(SMARL).将智能体的控制网络划分为动作网络和目标网络,以深度确定性策略梯度和序列到序列分别作为分割后的基础网络结构,分离算法结构与规模的相关性.同时,对算法输入输出进行特殊处理,分离算法策略与规模的相关性.SMARL中的智能体可较快适应新的环境,担任不同任务角色,实现快速学习.实验表明SMARL在适应性、性能和训练效率上均较优.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	史腾飞
	王莉
	黄子蓉

关键词 ：多智能体强化学习, 深度确定性策略梯度(DDPG), 序列到序列(Seq2Seq), 分块结构

Abstract：The multi-agent reinforcement learning algorithm is difficult to adapt to dynamically changing environments of agent scale. Aiming at this problem, a sequence to sequence multi-agent reinforcement learning algorithm(SMARL) based on sequential learning and block structure is proposed. The control network of an agent is divided into action network and target network based on deep deterministic policy gradient structure and sequence-to-sequence structure, respectively, and the correlation between algorithm structure and agent scale is removed. Inputs and outputs of the algorithm are also processed to break the correlation between algorithm policy and agent scale. Agents in SMARL can quickly adapt to the new environment, take different roles in task and achieve fast learning. Experiments show that the adaptability, performance and training efficiency of the proposed algorithm are superior to baseline algorithms.

Key words： Multi-agent Reinforcement Learning Deep Deterministic Policy Gradient(DDPG) Sequence to Sequence(Seq2Seq) Block Structure

收稿日期: 2020-10-10

ZTFLH:

TP 18

基金资助:国家自然科学基金项目(No.61872260)资助

通讯作者: 王莉,博士,教授,主要研究方向为人工智能、机器学习.E-mail:wangli@tyut.edu.cn.

作者简介: 史腾飞,硕士研究生,主要研究方向为强化学习.E-mail:373321502@qq.com.黄子蓉,硕士研究生,主要研究方向为强化学习.E-mail:453774012@qq.com.

引用本文:

史腾飞, 王莉, 黄子蓉. 序列多智能体强化学习算法[J]. 模式识别与人工智能, 2021, 34(3): 206-213. SHI Tengfei, WANG Li, HUANG Zirong. Sequence to Sequence Multi-agent Reinforcement Learning Algorithm. , 2021, 34(3): 206-213.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.202103002 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2021/V34/I3/206