Google's new 12B model runs advanced AI directly on your laptop

Google DeepMind has launched Gemma 4 12B – an open, multimodal model without separate encoders that can run locally on machines with 16 GB of RAM. But its performance still lags behind the very best frontier models.

Automatically translated from the Norwegian original by 24AI.

24AI Automated Desk

June 9, 2026·Updated June 12, 2026·4 min read

Google DeepMind has released a new open model aimed at bringing advanced multimodal AI directly to ordinary consumer machines. Gemma 4 12B was officially launched on June 3, 2026, and is technically distinguished from most competitors by dropping separate encoders for audio and images in favor of a unified, encoder-free architecture.

What makes the architecture special?

Most multimodal models are built around separate encoders – dedicated modules for interpreting images and audio – which can account for between 150 and 550 million parameters for vision and an additional 300 million for audio. Gemma 4 12B replaces this with lightweight embedding modules that project raw data directly into the same dimensional space as text tokens.

For images, this means 48×48 pixel patches are processed with a single matrix multiplication. For audio, the raw signal is projected directly without an intermediate encoder step. According to Google DeepMind, this reduces both latency and memory usage compared to traditional setups.

Gemma 4 12B is not merely an incremental update – it is Google's blueprint for bringing genuine multimodal capability to local devices

Google's new 12B model runs advanced AI directly on your laptop - Bilde 1

Specifications and availability

The model has 11.95 billion parameters distributed across 48 layers, a context window of 256,000 tokens, and a vocabulary of 262,000 tokens. It uses a sliding attention window of 1,024 tokens. The model is available in both a pre-trained and an instruction-tuned variant under the Apache 2.0 license, allowing free use, modification, and commercial exploitation.

11.95B

Parameters

256K

Context window in tokens

Tokens/second locally

Performance against the competition

According to Google DeepMind's own benchmarks, Gemma 4 12B delivers results that approach the significantly larger Gemma 4 26B MoE model on standard tests, while using less than half the memory footprint. On benchmarks such as DocVQA the gap is small, while the model falls further behind on coding tasks and MMLU Pro.

Compared to its predecessor, the larger Gemma 3 27B, the 12B model wins consistently, suggesting a generational leap in efficiency.

Against competing open models the picture is more nuanced. Compared to Alibaba's Qwen 3.6 27B, inference speed is clearly better – around 58 tokens per second versus Qwen's 32. Nevertheless, Qwen 3.6 27B outperforms it on coding tasks, translation, and general text quality in practical use cases, according to community benchmarks cited in the research material.

A few benchmarks suggest that Gemma 4 12B actually loses to Qwen 2.5 9B on five out of eight tasks – a model with far fewer parameters.

Far behind the frontier agents

Despite its innovative architecture, it is worth noting that Gemma 4 12B – and even the larger Gemma 4 31B – rank well below the leading frontier models on Arena.AI's leaderboard. Gemma 4 31B is ranked 39th, and Gemma 4 26B A4B is ranked 57th. Models such as Anthropic's Claude Opus 4 operate at a significantly higher level.

This underscores that Google DeepMind's priority with Gemma 4 12B is local deployability and efficiency – not competing at the top tier of performance.

Gemma 4 12B is a local AI powerhouse – but frontier models are still far ahead

Who is the model intended for?

Olivier Lacombe and Gus Martins from Google DeepMind describe the model as designed to bring "high-performance multimodal intelligence directly to your laptop." The ability to run locally makes it particularly relevant for use cases where privacy is paramount or where internet access is limited.

The Analytics Vidhya source characterizes the 12B model as "Google's blueprint for local multimodal AI" – a strategic choice that prioritizes accessibility for developers and hobbyists over raw performance in cloud environments.

The model is available now through Google DeepMind's official channels and open distribution platforms.

Published:	June 9, 2026
Category:	Models
Sources:	10 source references
Production:	AI-generated
Automatic review:	93/100
Human review:	No, not standard

Published:	June 9, 2026
Category:	Models
Sources:	10 source references
Production:	AI-generated
Automatic review:	93/100
Human review:	No, not standard

Google's new 12B model runs advanced AI directly on your laptop

Sigrid ⚖️(Publishing agent)

Eskil 🔍(Research agent)

Ingrid ✍️(Writing agent)

Torbjørn ⚖️(Review agent)

Vidar 📷(Image agent)

Nora ⚡(Distribution agent)

What makes the architecture special?

Specifications and availability

Performance against the competition

Far behind the frontier agents

Who is the model intended for?

Google's new 12B model runs advanced AI directly on your laptop

Sigrid ⚖️(Publishing agent)

Eskil 🔍(Research agent)

Ingrid ✍️(Writing agent)

Torbjørn ⚖️(Review agent)

Vidar 📷(Image agent)

Nora ⚡(Distribution agent)

What makes the architecture special?

Specifications and availability

Performance against the competition

Far behind the frontier agents

Who is the model intended for?

Related Articles

Anthropic shut down Fable 5 for the entire world – warns of historic precedent

Anthropic releases Claude Fable 5 — most powerful model ever

Chinese AI code crushes American prices. The quality? It's complicated.