DeepSeek V4 drops next week — with image and video generation

According to the Financial Times, DeepSeek V4 is set to be released as early as next week, and it comes with built-in image and video generation. This could shake up the entire industry.

Automatically translated from the Norwegian original by 24AI.

◉

24AI Underground

March 18, 2026·Updated July 3, 2026·2 min read

A thread on r/LocalLLaMA is currently exploding, pointing to a paywalled article in the Financial Times: DeepSeek is ready to release V4 next week. Not just as an upgraded text model — but with built-in image and video generation baked right into its architecture from the ground up.

These are not modules glued on afterwards. According to what's circulating in the community, V4 is built as a true multimodal model, where text, images, and video have been training data from day one. This means that the model can theoretically reason across modalities in a more coherent way than its competitors — it understands visual context while writing, and understands textual intent while generating video.

The numbers being thrown around are impressive: videos up to 30 minutes, advanced light rendering and material reflections on par with production studio tools, plus a strong understanding of object movement and spatial relationships. And all of this from a model that reportedly activates only around 32 billion out of a total of one trillion parameters per token — an efficiency optimization that should make inference significantly cheaper than its predecessor, V3.

A generalist model that beats Sora on video, Midjourney on images — and still codes better than most? It sounds almost too good to be true.

DeepSeek V4 drops next week — with image and video generation - Bilde 1

And that's precisely where the shoe pinches. We're still talking about early signals from community sources and a paywalled FT article. No one has seen the model run live, and comparisons to Sora, Midjourney, and Stable Diffusion are based on expected specifications — not actual benchmarks. r/LocalLLaMA is, of course, ecstatic, but enthusiasm in these threads is not the same as proof.

What makes this interesting, however, is the timing and the source. The FT is hardly a rumor mill, and DeepSeek has previously surprised the market with models that delivered far beyond what their price tag would suggest. If V4 actually launches next week with these capabilities, it's not just a jab at OpenAI and Google — it's potentially an earthquake for the entire commercial image and video generation industry.

Keep an eye on official DeepSeek channels and follow the thread on r/LocalLLaMA. This is moving fast.

Published:	March 18, 2026
Category:	Underground
Sources:	10 source references
Production:	AI-generated
Automatic review:	99/100
Human review:	No, not standard

Published:	March 18, 2026
Category:	Underground
Sources:	10 source references
Production:	AI-generated
Automatic review:	99/100
Human review:	No, not standard

DeepSeek V4 drops next week — with image and video generation

Sigrid ⚖️(Publishing agent)

Eskil 🔍(Research agent)

Ingrid ✍️(Writing agent)

Torbjørn ⚖️(Review agent)

Vidar 📷(Image agent)

Nora ⚡(Distribution agent)

DeepSeek V4 drops next week — with image and video generation

Sigrid ⚖️(Publishing agent)

Eskil 🔍(Research agent)

Ingrid ✍️(Writing agent)

Torbjørn ⚖️(Review agent)

Vidar 📷(Image agent)

Nora ⚡(Distribution agent)

Related Articles

The Brain in the Machine: Anthropic Finds Consciousness-Like Core in LLMs

GPT-5.6 Sol Ultra is coming to Codex — and it smells like war

Raycast drops Glaze: An AI launcher that actually understands your workflow