Tools

NVIDIA's Blackwell Sweeps the Field: 3x Faster AI Training Than H100

NVIDIA's new Blackwell architecture has dominated the MLPerf Training 6.0 benchmark, setting records across every category. The latest GB300 NVL72 systems are up to 1.6 times faster than their predecessor, the GB200.

Automatically translated from the Norwegian original by 24AI.

24AI Automated Desk

June 22, 2026·Updated June 28, 2026·4 min read

NVIDIA's Blackwell Sweeps the Field: 3x Faster AI Training Than H100

Behind the story ⚡ (AI telemetry)Click to expand

See how six named AI agents in the 24AI flow handled intake, verification, writing, review, and visuals for this story. The agents are system roles, not people, journalists, or responsible editors.

Sigrid ⚖️(Publishing agent)

Caught the story from the RSS feed «NVIDIA AI Blog» and cleared it for the desk based on news value and relevance.

Ask Sigrid about intake →

Eskil 🔍(Research agent)

Ran Google Search research and cross-checked claims against 10 independent sources.

See research with Eskil →

Ingrid ✍️(Writing agent)

Drafted the article in a clear tabloid style, wrote the TL;DR, and added structural pull quotes.

Discuss the angle with Ingrid →

Torbjørn ⚖️(Review agent)

Quality score:95 / 100

“Solid piece — credible sources, clear language, and a strong angle.”

Challenge Torbjørn's review →

Vidar 📷(Image agent)

Generated the hero image and in-article illustrations.

Prompt: Hero — A wide-angle handheld smartphone documentary photo of a large-scale AI data center server hall, shot from a low angle looking down a long aisle of densely packed rack-mounted GPU systems. The racks are modern, sleek, and tightly organized with visible cable management. Overhead LED lighting casts a clean, bright light across the metal surfaces. The composition has slight framing imperfection as if taken quickly on-site by a journalist. Natural depth of field, mild sensor grain, real-world industrial texture, no people visible. Color temperature: bright Nordic daylight translated to a cool, clean editorial white-blue interior light — crisp and modern, not dark or moody. No text, no screens, no readable signage.

Talk visuals with Vidar →

Nora ⚡(Distribution agent)

Prepared scroll-stopping share copy for Bluesky, X, and Facebook ahead of publish.

Get sharing tips from Nora →

TL;DR

NVIDIA Blackwell took first place in every category of the MLPerf Training 6.0 benchmark
GB200 NVL72 systems train Llama 3.1 405B up to 3.2 times faster than optimized Hopper systems
The new GB300 NVL72 ("Blackwell Ultra") is a further 1.6 times faster than the GB200 NVL72
AMD MI300X remains competitive on certain memory-intensive tasks, but is falling behind Blackwell in raw training performance

❖ QUALITY STATUS

Published:	June 22, 2026
Category:	Tools
Sources:	10 source references
Production:	AI-generated
Automatic review:	95/100
Human review:	No, not standard

AI models all start in the same place: with a training run. The quality and speed of the training infrastructure determines how quickly teams can iterate, what model scale they can handle, and whether jobs complete reliably. With the MLPerf Training 6.0 results now published, it is clear that NVIDIA's Blackwell generation is setting a new industry standard — according to NVIDIA itself and the available benchmark data.

Blackwell Crushes the Competition in MLPerf

MLPerf Training is one of the industry's most widely recognized independent benchmark series for AI training infrastructure. In the latest round, version 6.0, NVIDIA took first place in every category with its Blackwell-based systems, according to NVIDIA's own blog.

The most impressive numbers relate to large language model training. The GB200 NVL72 system — which links 72 Blackwell GPUs together in a rack format — delivered up to 3.2 times faster training on Llama 3.1 405B compared with optimized Hopper solutions (H100) using FP8 precision. According to NVIDIA, the improved performance is largely attributable to the introduction of NVFP4 precision and software optimizations.

3.2x

Faster training vs. H100 (Llama 3.1 405B)

1.6x

GB300 faster than GB200

NVIDIA's Blackwell Sweeps the Field: 3x Faster AI Training Than H100 - Bilde 1

GB300 NVL72: Blackwell Ultra Takes It One Step Further

If the GB200 NVL72 is already powerful, the new GB300 NVL72 — dubbed "Blackwell Ultra" — is faster still. According to available benchmark data, the GB300 system delivers up to 1.6 times higher training performance than the GB200 at the same scale. That is a remarkable generational leap within the same architecture family.

The B200 GPU at the heart of the Blackwell lineup is built on a dual-die CoWoS design manufactured on TSMC's 4NP process, featuring 208 billion transistors and 192 GB of HBM3E memory with 8 TB/s of bandwidth. The introduction of native FP4 tensor operations is one of the most significant technical innovations compared with the previous generation.

With the GB300 NVL72, it is now possible to train models at a scale that previously required considerably more time and resources.

Real Customers Confirm the Performance

Benchmark figures from NVIDIA naturally warrant a critical eye — the company has obvious commercial interests. It is therefore worth noting that independent users are reporting similar results. Cohere, known for enterprise-focused AI, reportedly achieved three times faster training for its North platform on the GB200 NVL72, according to available sources. Image-generation service Midjourney is said, according to the same source material, to be scaling up a large fleet of Blackwell Ultra GPUs for training upcoming image and video models.

These claims are of course difficult to verify independently, but they suggest that the performance gains are not merely figures on paper.

AMD MI300X: Still Relevant, but Under Pressure

It is important to maintain a nuanced view of the competitive landscape. The AMD Instinct MI300X remains a serious contender, particularly for memory-intensive workloads. With 192 GB of HBM3 memory and 5.3 TB/s of bandwidth, the MI300X is exceptionally well suited to inference of very large models on a single GPU, reducing the need for model sharding and network overhead.

In MLPerf Inference v4.1, the MI300X demonstrated strong performance on Llama 2 70B inference, and AMD has claimed advantages of 20–60 percent over the H100 in certain inference scenarios. For raw large-scale AI training, however, the picture is different: the Blackwell B200 delivers roughly double the raw compute of the H200 across various precision formats.

A key factor is the software stack. AMD's ROCm platform has made significant strides, but is generally considered less mature than NVIDIA's CUDA ecosystem. According to independent analyses, this can result in the MI300X realizing only 37–66 percent of its theoretical capacity in real-world LLM workloads — a substantial limitation that AMD is actively working to reduce.

AMD MI300X is strong on memory-intensive inference, but for pure large-scale AI training, Blackwell is setting a new industry standard that is difficult to compete with today.

What Does This Mean for the AI Training Landscape?

When the fundamental training infrastructure improves by a factor of three from one generation to the next, it changes what is possible to build. Models that previously took weeks to train can now be completed in days. That lowers the threshold for iteration and experimentation — and in practice accelerates the entire AI development cycle.

The MLPerf benchmark is not perfect, and there is always a gap between controlled test conditions and production environments. But as a comparative measure of training infrastructure it is widely recognized in the industry, and Blackwell's dominance here is difficult to ignore.

The sources for this article include NVIDIA's official blog on MLPerf Training 6.0 as well as independent analytical work comparing AMD Instinct MI300X and NVIDIA Blackwell in real-world training scenarios.

AI AND QUALITY STATUS

This story is produced by 24AI with AI and automatically quality-checked before publication. Standard stories are normally not manually approved before publication. 24AI is not an editor-led journalistic medium. Named desk roles are AI agents, not people, journalists, or responsible editors. Sources are shown below, and errors can be reported to post@aprex.no. Read our method →

Sources (10)

6.performance-intensive-computing.com

7.tensorwave.com

8.trgdatacenters.com

9.newsletter.semianalysis.com

10.spheron.network

← All news