For Full-Text PDF, please login, if you are a member of IEICE,|
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
Learning in Two-Player Matrix Games by Policy Gradient Lagging Anchor
Shiyao DING Toshimitsu USHIO
IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences
Publication Date: 2019/04/01
Online ISSN: 1745-1337
Type of Manuscript: LETTER
Category: Mathematical Systems Science
reinforcement learning, policy gradient, multi-agent systems, matrix game,
Full Text: PDF(496.1KB)>>
It is known that policy gradient algorithm can not guarantee the convergence to a Nash equilibrium in mixed policies when it is applied in matrix games. To overcome this problem, we propose a novel multi-agent reinforcement learning (MARL) algorithm called a policy gradient lagging anchor (PGLA) algorithm. And we prove that the agents' policies can converge to a Nash equilibrium in mixed policies by using the PGLA algorithm in two-player two-action matrix games. By simulation, we confirm the convergence and also show that the PGLA algorithm has a better convergence than the LR-I lagging anchor algorithm.