模式识别与人工智能
Friday, Apr. 11, 2025 Home      About Journal      Editorial Board      Instructions      Ethics Statement      Contact Us                   中文
Pattern Recognition and Artificial Intelligence  2023, Vol. 36 Issue (1): 81-91    DOI: 10.16451/j.cnki.issn1003-6059.202301007
Current Issue| Next Issue| Archive| Adv Search |
Policy Gradient Algorithm in Two-Player Zero-Sum Markov Games
LI Yongqiang1, ZHOU Jian1, FENG Yu1, FENG Yuanjing1
1. College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023

Download: PDF (884 KB)   HTML (1 KB) 
Export: BibTeX | EndNote (RIS)      
Abstract  In two-player zero-sum Markov games, the traditional policy gradient theorem is only applied to alternate training of two players due to the influence of one player's policy on the other player's policy. To train two players at the same time, the policy gradient theorem in two-player zero-sum Markov games is proposed. Then, based on the policy gradient theorem, an extra-gradient based REINFORCE algorithm is proposed to achieve approximate Nash convergence of the joint policy of two players. The superiority of the proposed algorithm is analyzed in multiple dimensions. Firstly, the comparative experiments on simultaneous-move game show that the convergence and convergence speed of the proposed algorithm are better. Secondly, the characteristics of the joint policy obtained by the proposed algorithm are analyzed and these joint policies are verified to achieve approximate Nash equilibrium. Finally, the comparative experiments on simultaneous-move game with different difficulty levels show that the proposed algorithm holds a good convergence speed at higher difficulty levels.
Key wordsMarkov Game      Zero-Sum Game      Policy Gradient Theorem      Approximate Nash Equilibrium     
Received: 05 August 2022     
ZTFLH: TP18  
Fund:General Program of National Natural Science Foundation of China(No.62073294), Natural Science Foundation of Zhejiang Province(No.LZ21F030003)
Corresponding Authors: LI Yongqiang, Ph.D., associate professor. His research interests include artificial intelligence, reinforcement learning and game theory.   
About author:: ZHOU Jian, master student. His research interests include reinforcement learning and markov game.FENG Yu, Ph.D., professor. His research interests include artificial intelligence, reinforcement learning and game theory.FENG Yuanjing, Ph.D., professor. His research interests include artificial intelligence, image processing and intelligent optimization.
Service
E-mail this article
Add to my bookshelf
Add to citation manager
E-mail Alert
RSS
Articles by authors
LI Yongqiang
ZHOU Jian
FENG Yu
FENG Yuanjing
Cite this article:   
LI Yongqiang,ZHOU Jian,FENG Yu等. Policy Gradient Algorithm in Two-Player Zero-Sum Markov Games[J]. Pattern Recognition and Artificial Intelligence, 2023, 36(1): 81-91.
URL:  
http://manu46.magtech.com.cn/Jweb_prai/EN/10.16451/j.cnki.issn1003-6059.202301007      OR     http://manu46.magtech.com.cn/Jweb_prai/EN/Y2023/V36/I1/81
Copyright © 2010 Editorial Office of Pattern Recognition and Artificial Intelligence
Address: No.350 Shushanhu Road, Hefei, Anhui Province, P.R. China Tel: 0551-65591176 Fax:0551-65591176 Email: bjb@iim.ac.cn
Supported by Beijing Magtech  Email:support@magtech.com.cn