Physical AI Series: Part 3: Who Owns the Assembly

writing • physical AI series

2026-05-26T00:00:00.000Z

part 3: Who Owns the Assembly

The three camps in physical AI aren’t competing. They’re becoming layers of the same stack. The question is who owns the layer where it all comes together.

the verification economy

style

blogbaneer

the defensible moat is the eval-and-deployment flywheel. three camps are becoming layers of the same stack.

the three camps in physical ai are real-world data, simulation, foundation models. they're becoming layers of the same stack rather than competing approaches. the question was always in what order they get assembled and who owns the assembly.

the assembly itself is the most defensible layer. specifically: the eval-and-deployment flywheel, where regression testing, observability, and supervised teleoperation turn live operation back into the next training cycle. pre-training becomes horizontal. simulation concentrates around a handful of leaders. real-data suppliers consolidate. the layer that compounds with every deployed robot hour, and costs months of regression risk to switch out of, supports several durable category leaders.

this piece sets out the case for that view.

how the camps converge

the architecture different labs are building, separately, has the same shape.

pre-training sits at the top, drawing on web video, foundation-model knowledge, and increasingly on neural world models for environment scale. hybrid simulation comes next, combining physics-grounded engines with generative environments. below that, targeted real-world data for fine-tuning to specific deployments. the eval-and-deployment flywheel sits at the bottom. this is the layer where the assembly actually happens.

several recent moves point to this convergence: the Lightwheel-World Labs Real-to-Sim partnership pairing physics-anchored simulation with generative world creation, NVIDIA's Cosmos and the DeepMind-Disney Newton physics engine both available open-source, foundation-model labs combining vlas with world models inside the same product. three years from now, treating these as separate categories will be a historical artifact. they're already converging in how the stack actually works.

inside the simulation layer, a near-term window remains open for task-specific approaches: per-skill generative environments trained on small amounts of real footage, deployable today before generalist world models reach maturity for contact-heavy tasks. the interesting question is which players use that window to embed in customer pipelines deeply enough to remain valuable through the transition.

why the supplier layer reinforces the thesis

there are 80+ companies supplying physical ai training data globally today. demand is concentrated among 10 to 15 serious foundation-model buyers running campaign-based procurement of hundreds of thousands of hours per cycle. supply already exceeds demand by one to two orders of magnitude. exclusive licences are starting to harden. compression toward a smaller set of leaders within three years is already visible.

the vendors gaining the most attention from labs are those layering pipeline, observability, and validation tooling on top of collection. turning fleet sensor data into queryable, normalised, learnable training signal. companies like Lightwheel and Mecka, which started in collection but are clearly building toward this infrastructure layer, look better positioned than vendors that scale collection alone. geography matters: pre-training rewards diversity, fine-tuning rewards local specificity. suppliers building deliberate global mixes from the start outpace those that scale within one geography first.

this reinforces the central claim. vendors that move up-stack from collection into pipeline tooling are building the on-ramp to the eval-and-deployment layer. the infrastructure that makes data learnable is contiguous with the infrastructure that makes deployment safe and repeatable.

the flywheel layer

once a robotics customer has built their regression tests, ci for policy updates, fleet observability, and teleoperation-supervision tooling on top of someone's environment, switching is genuinely hard. every deployed robot hour adds to the moat.

this is what Covariant has been compounding on for eight years across 26 customers. it is what 1X is building with NEO's Teleoperation-as-a-Service, where edge cases become training episodes the moment a human supervisor steps in. it is what Lightwheel is positioning as a candidate industry-standard benchmark. these are early plays for the layer where deployment data, evaluation, and live supervision intersect.

eval and deployment is the only layer where the moat actually compounds. every additional deployed hour is both incremental training data and incremental switching cost. pre-training gets cheaper as foundation-model substrates standardise. simulation consolidates and competes on quality. real-data suppliers concentrate and compete on geography and pipeline tooling. the flywheel layer is the only one that gets more defensible the longer it runs.

the geographic shift already happening

underneath the convergence story sits a dynamic that anyone building or investing in this stack has to take seriously. China is moving very fast on hardware and data volume.

AGIBOT has built roughly 10,000 robots in three years. Unitree and BYD are targeting tens of thousands of units in 2026. per-unit costs are dropping around 40% per year. chinese teleoperation data factories run hundreds to thousands of operators per facility. among practitioners: China leads on manufacturing, supply chain, hardware iteration. the US leads on foundation models, autonomy, software infrastructure.

multiple dynamics run in parallel. western labs already train on data factories operated globally, including in China. chinese hardware platforms increasingly run western foundation models. cost-sensitive deployment markets will lean toward chinese hardware first, while frontier capability sits closer to western model labs. for the eval-and-deployment layer specifically, the implication is global by default. the flywheel companies that win will be the ones whose tooling works across both hardware ecosystems. the layer that compounds with deployed hours wins regardless of which hardware those hours run on.

what this means

the convergence is already underway. what it means depends on where you sit.

for investors: single-layer bets are harder than they look. the most durable companies will be embedded across enough layers, and across enough geographies, that the stack cannot be reassembled without them. the infrastructure-oriented suppliers moving into eval, the simulation platforms building standardised benchmarks, the open-source projects that become the de facto reference and then commercialise the deployment surface.

for customers: the next few years bring genuine choice for the first time. cost-sensitive deployments will increasingly come from China. frontier capability will sit closer to western model labs. most operators will run mixed stacks by design.

for founders: the intelligence race in physical AI is real and ongoing. the companies that will define the category in ten years are probably building the pipelines and flywheels right now that make robots learnable at scale. the ones winning the demo cycle will fade. the headlines will continue to favour the model layer. the durable economics are quietly accumulating further down.

/article

style

related-articles

Related articles