July 2022 - Land of GPT

July 2022

Why do Policy Gradient Methods work so well in Cooperative MARL? Evidence from Policy Representation

In cooperative multi-agent reinforcement learning (MARL), due to its on-policy nature, policy gradient (PG) methods are typically believed to be less sample efficient than value decomposition (VD) methods, which are…