SPACeR: Self-Play Anchoring with Centralized Reference Models — ICLR 2026 Publication

February 19, 2026
1 min read

Realistic traffic simulation is critical to safely and efficiently developing autonomous vehicles. To be truly effective, simulations must strike a balance — running at scale while accurately capturing the subtle, human-like behaviors that define real-world traffic.

Existing methods tend to fall short on one side of this tradeoff. Large data-driven models generate human-like behaviors but are slow and expensive, while scalable self-play simulations often drift away from human driving norms.

This blog post introduces SPACeR, a new framework that bridges this gap. SPACeR combines the speed and scalability of self-play with the realism of human driving data by anchoring lightweight simulation agents to a pretrained human behavior model. The result: interactive, human-like traffic at scale — up to 10× faster and roughly 50× smaller than leading generative approaches.

Motivation

Developing autonomous vehicles safely and at scale depends on high‑fidelity simulation. Simulated traffic must not only look realistic but also interact realistically — responding to merges, lane changes, and right‑of‑way decisions the way human drivers would.

Most existing approaches fall into two camps. Large, data-driven models such as diffusion or Transformer-based simulators can reproduce detailed human behavior but are computationally heavy and slow to run. On the other hand, self-play methods — where agents learn by interacting with each other in simulation — scale efficiently but often lose touch with human driving norms over time.

Our goal with SPACeR is to bring these strengths together: combining the realism of data-driven human models with the scalability and adaptability of self-play simulation.

What is Self-Play?

Self-play refers to training agents by repeatedly interacting with each other in simulation, allowing complex multi-agent behaviors to emerge naturally (similar to how AlphaGo policies learn by playing against different versions of themselves).

Method: Human-Like Self-Play

SPACeR agents are lightweight, decentralized policies that learn through self-play by repeatedly interacting with each other in closed-loop simulation, enabling scalable multi-agent behavior. To preserve realism, SPACeR anchors self-play to a pretrained token-based generative model trained on real-world driving data, which captures complex and interactive human driving behaviors. During training, the agents’ trajectories are continually compared against this reference model, providing a signal of “human realism” that steers learning toward natural, socially aware driving patterns. The reference model is used only during training, so deployed simulations remain fast, efficient, and highly scalable.

Results

In our experiments SPACeR agents are significantly faster and more reactive than state-of-the-art imitation-learned models, surfacing an important dimension of traffic simulation quality that traditional metrics often overlook. This improved reactivity translates into more realistic behavior when agents are run in closed-loop, especially as scenarios deviate from the original logged data. Together, these properties make SPACeR a strong fit for large-scale, high-fidelity simulation workloads where both realism and throughput matter.

Integration into Applied Intuition's Simulation Products

To achieve the goal of validating ADAS and autonomous driving products entirely in simulation, we need traffic agents that behave more like real drivers and less like scripted placeholders. Integrating SPACeR into Applied Intuition’s simulation products enables exactly that: agents that are reactive (closed-loop), human-like (anchored to real data), and highly scalable (fast to run across large fleets of scenarios.

On the Waymo Sim Agents Challenge, our approach improves human-likeness over prior self-play methods, increasing realism scores from about 0.70 to 0.74. Compared to leading imitation-learning models such as SMART, we deliver nearly 10× faster inference, over 200 scenarios per second on a single gpu, while significantly reducing collision and off-road behavior.

Integrating SPACeR creates a single, unified "intelligence layer" across Applied Intuition’s entire simulation suite. By replacing static log replays and rigid rule-based models with reactive, human-like agents, we solve a fundamental challenge of simulation validity: ensuring that background traffic behaves realistically in response to the Ego vehicle’s decisions.

This integration delivers three core benefits that enable us to replace expensive on-road testing with high-confidence simulation:

  • True closed-loop reactivity: SPACeR replaces passive log replay with socially aware agents that respond to the ego vehicle’s choices in real time—yielding, merging, and negotiating right-of-way like human drivers, so bad plans are penalized and good ones are reinforced.
  • Massive scalability & speed: Because SPACeR is 10x faster and far smaller than standard generative models, teams can scale from thousands to millions of virtual miles, running rich urban scenarios in Object Sim, Log Sim, and Neural Sim without prohibitive compute costs.
  • Natural adversarial testing: By adjusting how strongly agents are anchored to human data, SPACeR can generate realistic but challenging edge cases —such as aggressive cut-ins or inattentive pedestrians—allowing ADAS and autonomy stacks to be stress-tested on rare, safety-critical situations that are difficult or dangerous to reproduce on public roads.


SPACeR demonstrates that human log data and self-play training can be combined to produce realistic, scalable traffic agents, bringing a unified, human-like intelligence layer to Applied Intuition’s simulation suite. At the same time, it opens up clear future directions in controllability, improved treatment of heterogeneous agents (such as trucks, pedestrians, and cyclists), and robustness to noise and distribution shifts. If you’re interested in deploying SPACeR-powered agents in your Object Sim, Log Sim, or Neural Sim workflows, contact our team or reach out to your Applied Intuition representative.

More details can be found on the project website and the preprinted paper, as well as our upcoming presentation in ICLR 2026.