Why do Policy Gradient Methods work so well in Cooperative MARL? Evidence from Policy Representation
In cooperative multi-agent reinforcement learning (MARL), due to its on-policy nature, policy gradient (PG) methods are typically believed to be less sample efficient than value decomposition (VD) methods, which are…