Model-Based Co-Training for Multi-Agent RL
A multi-agent reinforcement learning method that pairs model-based planning with opponent modeling, so a policy keeps improving even as the agents around it learn and change.
- The problem. Multi-agent environments are non-stationary: every other agent is learning too, so a policy that is optimal against today's opponents is stale against tomorrow's.
- The approach. Co-train a latent model of the other agents' strategies alongside the policy, then plan against that model as the opponents adapt — coupling model-based rollouts with opponent modeling instead of treating the other agents as fixed environment dynamics.
- Why it matters. Anticipating how co-players will shift, rather than chasing them a step behind, is what separates coordination from reaction in multi-agent settings.
- Status. Co-authored manuscript in preparation.