Models

Claude Opus 4.8 spawns hundreds of AI agents simultaneously. GPT-5.5 loses benchmarks.

Anthropic launched Claude Opus 4.8 on May 28, 2026, with Dynamic Workflows — an orchestration engine that lets a single Claude session plan, delegate, and verify work across hundreds of parallel sub-agents. Benchmarks beat GPT-5.5 on coding tasks.

Automatically translated from the Norwegian original by 24AI.

24AI Automated Desk

June 6, 2026·6 min read

Claude Opus 4.8 spawns hundreds of AI agents simultaneously. GPT-5.5 loses benchmarks.

Behind the story ⚡ (AI telemetry)Click to expand

See how six named AI agents in the 24AI flow handled intake, verification, writing, review, and visuals for this story. The agents are system roles, not people, journalists, or responsible editors.

Sigrid ⚖️(Publishing agent)

Flagged the story as highly relevant for readers and moved it forward in the 24AI flow.

Ask Sigrid about intake →

Eskil 🔍(Research agent)

Ran Google Search research and cross-checked claims against 10 independent sources.

See research with Eskil →

Ingrid ✍️(Writing agent)

Drafted the article in a clear tabloid style, wrote the TL;DR, and added structural pull quotes.

Discuss the angle with Ingrid →

Torbjørn ⚖️(Review agent)

Quality score:74 / 100

“Solid piece — credible sources, clear language, and a strong angle.”

Challenge Torbjørn's review →

Vidar 📷(Image agent)

Generated the hero image and in-article illustrations.

Prompt: A wide editorial documentary photo of a developer standing in a bright open-plan tech office, facing a wall of six large curved monitors (all black/off) arranged in two rows. The developer's silhouette is visible from behind, one hand raised as if conducting an orchestra. Thin cables hang from the monitors like puppet strings. On the desk: three closed laptops, a mechanical keyboard, and scattered printed code review sheets. Late afternoon Nordic sunlight streams through floor-to-ceiling windows, creating long shadows. The scene suggests one human conducting hundreds of digital workers. Documentary realism, mild sensor grain, natural asymmetry, bright editorial daylight (5600K). No readable text anywhere.

Talk visuals with Vidar →

Nora ⚡(Distribution agent)

Prepared scroll-stopping share copy for Bluesky, X, and Facebook ahead of publish.

Get sharing tips from Nora →

TL;DR

Claude Opus 4.8 introduces Dynamic Workflows, allowing a single Code session to orchestrate hundreds of parallel sub-agents — still in research preview
The model scores 69.2% on SWE-Bench Pro versus GPT-5.5's 58.6% — but treat these numbers with caution, as the benchmarks are self-reported by Anthropic
Fast Mode delivers 2.5x higher speed at 3x lower cost, but token expenses can escalate dramatically with parallel execution
Pricing is unchanged for the standard model ($5/$25 per million tokens), but parallel agents multiply costs quickly

❖ QUALITY STATUS

Published:	June 6, 2026
Category:	Models
Sources:	10 source references
Production:	AI-generated
Automatic review:	Quality-checked
Human review:	No, not standard

You might think this is a routine model upgrade. It isn't. Anthropic hasn't just made Claude smarter — they've fundamentally changed what a single AI session can do with an entire codebase.

Feature	Claude Opus 4.8	GPT-5.5	Gemini 3.5
SWE-Bench Pro	69.2%	58.6%	not reported
Online-Mind2Web	84%	not reported	not reported
Parallel agents	Yes (Dynamic Workflows)	Limited	Limited
Fast Mode	Yes (2.5x, 3x cheaper)	No	No
Effort control	Yes	No	No
Price input/output (standard)	$5 / $25 per M tokens	varies	varies
Price input/output (Fast Mode)	$10 / $50 per M tokens	—	—
Status	GA + research preview	GA	GA

Benchmark data: Anthropic official announcement and aitoolsrecap.com. Independent third-party verification is not available as of publication date.

What are Dynamic Workflows?

The core technical principle is called the orchestrator-worker pattern. A single Claude Code session acts as a top-level planner — the orchestrator — breaking down complex tasks into discrete subtasks. It then spawns separate sub-agents, assigns them specific responsibilities, and coordinates their work in parallel.

Once the sub-agents complete their work, the orchestrator verifies the results against a defined requirements specification and reports back to the user. The entire flow takes place within a single session, without the developer having to manually coordinate across different tools or windows.

According to Anthropic's official documentation for Claude Code Sub-Agents, this is an extension of existing sub-agent functionality — but Dynamic Workflows formalizes and automates the coordination layer in a way that did not previously exist.

Claude Opus 4.8 spawns hundreds of AI agents simultaneously. GPT-5.5 loses benchmarks. - Bilde 1

Benchmarks: Impressive numbers with important caveats

Anthropic's own figures show 69.2% on SWE-Bench Pro — a demanding benchmark that tests the ability to resolve real GitHub issues. GPT-5.5, by comparison, scores 58.6% on the same benchmark, according to aitoolsrecap.com and userightai.com.

On Online-Mind2Web, which measures browser-based task completion, Anthropic reports 84% — with no direct GPT-5.5 comparison available for this benchmark.

A third improvement is more tangible in practice: the model uncritically accepts four times fewer code errors than its predecessor. In other words, Claude is now far more likely to flag questionable code rather than let it pass. Independent analyses from decodethefuture.org and orbilontech.com confirm this is a genuine behavioral change, but emphasize that the tests were conducted primarily by Anthropic itself.

Benchmarks are useful — but all figures are currently self-reported by Anthropic. Independent third-party verification is still absent.

Fast Mode and Effort Control: Two new dials

Fast Mode is likely to have the greatest immediate impact for most developers. Anthropic reports 2.5x higher inference speed at 3x lower cost compared to previous models. Fast Mode is priced at $10 per million input tokens and $50 per million output tokens — a higher absolute price than standard mode, but faster response time per dollar spent on throughput.

The standard price is unchanged: $5 in / $25 out per million tokens — the same level as the previous Claude Opus version.

Effort Control is a new parameter that lets developers explicitly instruct the model how deeply to reason about a given task. Simple routine tasks can be run at low effort with correspondingly lower cost; complex architectural questions can be run at full cognitive depth. According to totalum.app and creeta.com, this provides better cost control in production applications.

69.2%

SWE-Bench Pro score

Fewer uncritically accepted code errors

2.5x

Speed increase in Fast Mode

What does this mean for development teams?

For teams already using Claude Code, the upgrade is available via existing API integration with no migration work required. The pricing model is unchanged for standard use, lowering the barrier to testing the new functionality.

But here is the critical nuance: Dynamic Workflows is still in research preview. That means limited SLA guarantees, potential API changes, and functionality that is not production-ready for all use cases. Teams considering building business-critical pipelines on top of Dynamic Workflows should wait for general availability, or have a fallback plan in place.

Token costs are the second factor to calculate carefully. A hundred parallel sub-agents each solving their own subtask are billed as a hundred separate API calls. For a mid-sized refactoring job, this can quickly amount to $50–200 in a single run. For large teams with high volume, Dynamic Workflows may still be cost-effective compared to manual coordination — but the math is not universal.

Mythos: What do we know?

Anthropic's announcement mentions an upcoming model internally referred to as Mythos, without providing specific details beyond an expected launch "in the coming weeks" from May 28, 2026. As of today, no further information has been made public. Speculation about what Mythos contains is exactly that — speculation.

The competition: GPT-5.5 and Gemini 3.5

OpenAI and Google have not responded directly to Dynamic Workflows as a concept, but according to osasai.com and digitalstrategy-ai.com, competition over agent-based AI workflows is intensifying throughout the summer of 2026. GPT-5.5 has advantages on certain multimodal tasks and is more deeply integrated into the Microsoft stack. Gemini 3.5 competes primarily on context window size and Google Cloud integration.

On code-specific benchmarks, Anthropic's own figures paint a clear picture — but the absence of independent comparative testing makes it difficult to draw any definitive conclusions about who actually wins in production.

Bottom line

Claude Opus 4.8 is for you if you work on complex, long-running coding tasks where parallel orchestration delivers real time savings — and you can tolerate research preview risk and carefully calculate token costs.

Hold off if you need guaranteed production stability, are handling simple tasks where a single agent suffices, or don't have visibility into what hundreds of parallel API calls will cost in practice.

GPT-5.5 remains the stronger choice for teams deeply integrated into Microsoft infrastructure or requiring broad multimodal support beyond code.

This article is based on Anthropic's official announcement, documentation, and system card, as well as independent analyses from decodethefuture.org, totalum.app, aitoolsrecap.com, and orbilontech.com. Verified against 11 open primary and secondary sources.

AI AND QUALITY STATUS

This story is produced by 24AI with AI and automatically quality-checked before publication. Standard stories are normally not manually approved before publication. 24AI is not an editor-led journalistic medium. Named desk roles are AI agents, not people, journalists, or responsible editors. Sources are shown below, and errors can be reported to post@aprex.no. Read our method →

Sources (10)

4.decodethefuture.org

5.totalum.app

6.aitoolsrecap.com

7.osasai.com

8.digitalstrategy-ai.com

9.creeta.com

10.orbilontech.com