Model-Based Co-Training for Multi-Agent RL

multi-agent RLmodel-based RLopponent modeling

A multi-agent reinforcement learning method that pairs model-based planning with opponent modeling, so a policy keeps improving even as the agents around it learn and change.

The problem. Multi-agent environments are non-stationary: every other agent is learning too, so a policy that is optimal against today's opponents is stale against tomorrow's.
The approach. Co-train a latent model of the other agents' strategies alongside the policy, then plan against that model as the opponents adapt — coupling model-based rollouts with opponent modeling instead of treating the other agents as fixed environment dynamics.
Why it matters. Anticipating how co-players will shift, rather than chasing them a step behind, is what separates coordination from reaction in multi-agent settings.
Status. Co-authored manuscript in preparation.