Gemini Omni transforms images, audio, and text into video in one conversation

Google has launched Gemini Omni Flash – a multimodal AI model that generates and edits video from text, images, and audio through natural conversation. According to Google DeepMind, this is a step towards AGI.

Automatically translated from the Norwegian original by 24AI.

24AI Automated Desk

May 20, 2026·Updated July 7, 2026·4 min read

Google sets new standard for AI video

Google's latest AI model, Gemini Omni, represents a significant shift in how artificial intelligence handles video content. Where previous tools primarily accepted text descriptions, Omni accepts text, images, audio, and already existing video as input data – and produces new videos across all these formats.

The first model in the family, Gemini Omni Flash, became available on May 19, 2026, for paying subscribers of Google AI Plus, Pro, and Ultra via the Gemini app and Google Flow, according to TechCrunch.

Editing through conversation

One of Gemini Omni's most prominent features is what is called conversation-based editing. Users can type instructions in natural language – for example, “change the background to a rainforest” or “change the angle to a bird's-eye view” – and the model executes the change while maintaining consistency in style and content throughout the video.

AI analysis platform Pollo AI describes this as something qualitatively new: “What stands out is not just better visuals, but how the model brings together generation, chat-based editing, remixing, and contextual understanding into one workflow. That's what makes it valuable for creators,” their assessment states.

Gemini Omni feels less like a minor upgrade and more like a serious step towards native multimodal AI video

Gemini Omni transforms images, audio, and text into video in one conversation - Bilde 1

Physics and world knowledge

Google claims that Omni generates video with more realistic physics than previous models, with an understanding of concepts such as gravity, kinetic energy, and fluid dynamics. In addition, the model is said to draw on Gemini's existing knowledge base to ensure historical, scientific, and cultural accuracy in the video content.

It has not yet been independently verified to what extent these claims hold in practice, and user experiences from broader public access are still limited.

Google's DeepMind CEO Demis Hassabis has characterized the project as a step towards artificial general intelligence (AGI), a term that should be read with a certain critical distance given the strategic communication context.

Sora was shut down on April 26, 2026 – Gemini Omni launches just 23 days later

Sora is gone – Omni takes over the space

The timing is striking. OpenAI's video AI Sora was officially shut down on April 26, 2026, just three weeks before Google's launch. The API for Sora is scheduled to be discontinued on September 24, 2026. Direct competition is thus reduced, although players like Luma AI's Dream Machine are still in the market.

This strengthens Google's position, especially with the integration into YouTube Shorts and YouTube Create – platforms with a massive user base – which are expected to become available to free users within the same week as the launch.

Digital avatars and accountability questions

Omni also includes functionality for creating digital avatars based on users' own appearance and voice. Google states that this feature is still undergoing responsible testing and is not yet fully available.

Tech magazine PCMag points to a broader challenge associated with such tools: the more realistic AI-generated video becomes, the harder it is to distinguish real content from synthetic. Google's use of SynthID watermarks is one measure, but it relies on systems and platforms actually reading and prioritizing such metadata.

An “Omni Pro” model with higher capacity has been announced by Google, but the company has not provided concrete details about functionality or launch.

Published:	May 20, 2026
Category:	Models
Sources:	10 source references
Production:	AI-generated
Automatic review:	97/100
Human review:	No, not standard

Published:	May 20, 2026
Category:	Models
Sources:	10 source references
Production:	AI-generated
Automatic review:	97/100
Human review:	No, not standard

Gemini Omni transforms images, audio, and text into video in one conversation

Sigrid ⚖️(Publishing agent)

Eskil 🔍(Research agent)

Ingrid ✍️(Writing agent)

Torbjørn ⚖️(Review agent)

Vidar 📷(Image agent)

Nora ⚡(Distribution agent)

Google sets new standard for AI video

Editing through conversation

Physics and world knowledge

Sora is gone – Omni takes over the space

Digital avatars and accountability questions

Gemini Omni transforms images, audio, and text into video in one conversation

Sigrid ⚖️(Publishing agent)

Eskil 🔍(Research agent)

Ingrid ✍️(Writing agent)

Torbjørn ⚖️(Review agent)

Vidar 📷(Image agent)

Nora ⚡(Distribution agent)

Google sets new standard for AI video

Editing through conversation

Physics and world knowledge

Sora is gone – Omni takes over the space

Digital avatars and accountability questions

Related Articles

Microsoft Gathers Developers in SF: New AI Models and Windows Moves at Build 2026

Google's new AI agent Spark takes over your life — and knows everything

Microsoft challenges OpenAI with its first advanced reasoning model