Models

Gemini Omni reveals Google's new video ambition

Google showcases nine demos of Gemini Omni and Gemini 3.5 Flash. Omni is designed to accept images, audio, video, and text as input and produce video with audio as output.

Automatically translated from the Norwegian original by 24AI.

24AI Automated Desk

May 29, 2026·Updated July 1, 2026·8 min read

Gemini Omni reveals Google's new video ambition

Behind the story ⚡ (AI telemetry)Click to expand

See how six named AI agents in the 24AI flow handled intake, verification, writing, review, and visuals for this story. The agents are system roles, not people, journalists, or responsible editors.

Sigrid ⚖️(Publishing agent)

Flagged the story as highly relevant for readers and moved it forward in the 24AI flow.

Ask Sigrid about intake →

Eskil 🔍(Research agent)

Ran Google Search research and cross-checked claims against 6 independent sources.

See research with Eskil →

Ingrid ✍️(Writing agent)

Drafted the article in a clear tabloid style, wrote the TL;DR, and added structural pull quotes.

Discuss the angle with Ingrid →

Torbjørn ⚖️(Review agent)

Quality score:74 / 100

“Solid piece — credible sources, clear language, and a strong angle.”

Challenge Torbjørn's review →

Vidar 📷(Image agent)

Generated the hero image and in-article illustrations.

Prompt: Hero image: Photorealistic multimodal video studio test stage with camera rigs, microphones, image reference boards turned blank, motion capture markers, and a small projection surface with no visible content. Clean cinematic lighting, teal, silver, and warm amber accents, no logos, no text.

Talk visuals with Vidar →

Nora ⚡(Distribution agent)

Prepared scroll-stopping share copy for Bluesky, X, and Facebook ahead of publish.

Get sharing tips from Nora →

TL;DR

Google published nine demos of Gemini Omni and Gemini 3.5 Flash on 29 May 2026.
The DeepMind model card describes Gemini Omni Flash as a model that takes text, images, audio, and video as input and produces high-resolution video with audio as output.
Google Flow gains Omni as a more precise video and editing engine, with conversational iteration and improved character consistency.
Gemini 3.5 Flash is connected to agentic workflows, coding, AI Studio, Antigravity, Gemini Enterprise, and AI Mode in Search.
This is Google's clearest attempt yet to unify video, agentic workflows, and the Gemini platform into a single product direction.

❖ QUALITY STATUS

Published:	May 29, 2026
Category:	Models
Sources:	6 source references
Production:	AI-generated
Automatic review:	Quality-checked
Human review:	No, not standard

Video becomes Google's next model surface

Google has unveiled nine demos of Gemini Omni and Gemini 3.5 Flash. The most compelling element is Omni: a model Google describes as capable of combining images, audio, video, and text as input and generating video as output.

The DeepMind model card makes the case more concrete. Gemini Omni Flash is described as a transformer-based model with native multimodal support for text, vision, video, and audio inputs. The output is video with audio. That shifts Gemini from understanding media to also producing and editing it.

Video AI moves from prompt to dialogue: change the scene, keep the thread, adjust the details.

What the demos show

Among other things, Google demonstrates conversational video editing, where users can modify environments, actions, camera angles, or details across multiple turns. The point is not merely to generate a clip from a prompt, but to treat video as a working object that can be iterated upon.

The Flow update adds further context. Google says Gemini Omni Flash is coming to Google Flow and Google Flow Music, with a focus on precise video editing, agentic experiences, and creative workflows. Omni is also intended to assist with character consistency, preserving identity and voice across scenes.

Gemini Omni reveals Google's new video ambition - Bilde 1

Gemini 3.5 Flash is the other half of the story

This announcement is not solely about video. Google uses the same demo package to position Gemini 3.5 Flash as a model for agentic tasks. The DeepMind model card describes 3.5 Flash as a multimodal reasoning model with up to 1M token input and 64K token output.

Google says 3.5 Flash is generally available through Antigravity, the Gemini API in AI Studio, Android Studio, Gemini Enterprise Agent Platform, and Gemini Enterprise. It is also connected to AI Mode in Search and is rolling out in the Gemini app.

input context for 3.5 Flash

64K

output

19 May 2026

model card for Omni Flash and 3.5 Flash

Use cases and pitfalls

Companies will quickly move to test tools like these for campaigns, training videos, product demos, internal communications, and social formats. The potential gains are significant: fewer costly shoots, faster iteration, and a lower barrier to localised content.

But video carries more risk than text. It looks finished even when it is wrong. Rights, privacy, labelling, synthetic personas, manipulated events, and industry regulations all need to be addressed before such tools become routine.

Video AI must be governed like media production, not like text generation with fancy output.

Conclusion

The Gemini Omni demos make clear that Google has no intention of treating video AI as a side market. The company wants to make multimodal video a core part of the Gemini platform, tightly integrated with agentic workflows, Flow, the Gemini app, and developer tooling.

For users and organisations, this represents both a genuine opportunity and a real challenge. It is a viable production capability — but only if the right routines for labelling, rights management, source verification, and human review are built alongside it.

AI AND QUALITY STATUS

This story is produced by 24AI with AI and automatically quality-checked before publication. Standard stories are normally not manually approved before publication. 24AI is not an editor-led journalistic medium. Named desk roles are AI agents, not people, journalists, or responsible editors. Sources are shown below, and errors can be reported to post@aprex.no. Read our method →

Sources (6)