Physical AI — systems such as robots and self-driving vehicles that must navigate and act in the real world — has long required a patchwork of specialized models that needed to communicate with one another. NVIDIA now wants to put an end to that fragmented approach.

One Model for Everything

Cosmos 3 is built on what NVIDIA describes as a Mixture-of-Transformers (MoT) architecture, and represents a significant departure from the company's previous Cosmos generations. Where earlier versions distributed tasks across separate models for world generation, scene understanding, controlled generation, and policy generation respectively, Cosmos 3 handles all of these modalities within a single unified system — in a single forward pass, according to the NVIDIA blog.

The model can process and generate text, images, video, ambient audio, and action data. That last point is particularly important for robotics: Cosmos 3 can produce concrete numerical action data such as joint angles and grasp positions, which robots can learn from directly.

"The Cosmos 3 family gives developers a generational leap in the ability to build robots, autonomous vehicles, and vision AI that perceive, reason, plan, and act in the physical world." — Jensen Huang, Founder and CEO, NVIDIA
NVIDIA's Cosmos 3 Merges the Brain and Body of Robots into a Single Model - Bilde 1

Two Model Sizes — One Planned for the Edge

Cosmos 3 launches in two variants with clearly distinct use cases:

Cosmos 3 Nano is an 8-billion-parameter model (8B reasoner + 8B generator) scaled for efficient inference on workstation-class hardware, specifically NVIDIA's RTX PRO 6000 GPU. This makes the model accessible to developers who do not have access to data center infrastructure.

Cosmos 3 Super is a 32-billion-parameter model designed for large-scale synthetic data generation and research, and runs on NVIDIA's Hopper and Blackwell GPUs.

A third variant, currently referred to as Cosmos 3 Edge, has been announced for real-time inference directly on edge devices, but has not yet been released.

8B
Cosmos 3 Nano (parameters)
32B
Cosmos 3 Super (parameters)

From Months to Days — According to NVIDIA Itself

The company's own claims are ambitious: training and evaluation cycles for physical AI could, according to NVIDIA, be reduced from months to days using Cosmos 3. It is worth noting that these are NVIDIA's own figures, and independent verification of these savings was not available at the time of publication.

Among the stated use cases are synthetic data generation for warehouse safety scenarios, robot training for tasks such as folding laundry and pick-and-place operations, and the generation of rare driving scenarios for autonomous vehicles — the so-called "long-tail" situations for which real-world data is difficult to collect.

Cosmos 3 can serve as the backbone of what NVIDIA calls World Action Models — systems that allow robots to learn directly from simulated worlds

Open Source and Benchmark-Topping

The model has been made fully available as open source, with both model weights and training scripts on Hugging Face and GitHub. NVIDIA states that Cosmos 3 ranks at the top among open models on a range of industry benchmarks, including Artificial Analysis, Physics-IQ, PAI-Bench, and R-Bench for world generation, as well as RoboLab and RoboArena for action policies, according to the NVIDIA blog.

The Competition: Fragmented, but Established

Cosmos 3 does not compete directly with low-level frameworks such as ROS 2 and MoveIt, but it does challenge the traditional division of labor in physical AI development. ROS 2 remains the industry standard for robot middleware, handling communication and real-time control, while Cosmos 3 operates at a higher level of abstraction — and is intended to be integrated into ROS-based systems rather than replace them. NVIDIA already offers Isaac ROS as a bridge between its models and the ROS ecosystem.

The real challenge Cosmos 3 poses is to the fragmented pattern in which separate models for simulation, reasoning, and action generation must be coordinated manually — something that has until now been the norm in the field.

Cosmos 3 is NVIDIA's clearest signal yet that the company views physical AI — not just language models — as the next major growth area. Whether the technical promises hold up in practice is something research communities and industry partners will soon have the opportunity to put to the test.