两方零和马尔科夫博弈下的策略梯度算法
李永强1, 周键1, 冯宇1, 冯远静1

Policy Gradient Algorithm in Two-Player Zero-Sum Markov Games
LI Yongqiang1, ZHOU Jian1, FENG Yu1, FENG Yuanjing1
EG-R和带基线的EG-R的纳什收敛指标曲线