A currently exploding thread on r/LocalLLaMA has sparked a real buzz: Alibaba's Qwen team has dropped a new series of compact models without much warning, and the community's reaction is quite clear — people are impressed.

It's not just about the models being small. It's about what they can actually achieve.

Qwen3.5-9B is the model stealing the show right now. It fits on a single RTX 3060 with 12GB VRAM at 4-bit quantization — a reasonably priced, three-year-old card. Yet, benchmarks report that it beats GPT-5 Nano and Gemini 2.5 Flash-Lite on vision tasks by double-digit margins. On MathVision, it scores 78.9 against Google's 62.2. That's no small difference.

A 9B model that outperforms Google's and OpenAI's mini-models — and runs locally on consumer hardware.

One of the most interesting aspects is the MoE model Qwen3.5-35B-A3B. It has 35 billion parameters in total but activates only 3 billion during inference — and still surpasses the previous generation's 235B-A22B model. This tells us something important: Alibaba is pushing hard on architecture and data quality rather than just stacking more parameters. It's a clear trend we're going to see more of.

All models are natively multimodal (text, image, video from the same weights), support a 262K context window — expandable to around 1M tokens — and cover 201 languages and dialects. They are already available via Ollama, LMStudio, llama.cpp, and MLX.

For the smallest models (0.8B and 2B), the situation is even more extreme: they are designed to run directly on mobile phones, requiring from 3GB to 5GB of total memory.

A couple of caveats are worth mentioning. These are early signals from community sources, and user experiences vary. Some report hallucinations on specialized coding tasks (especially Solidity), while others have diametrically opposite experiences. Such variations are common at launch, and more systematic testing will follow.

Why is this important? Because the threshold for what can run locally — on your own machine, without API costs, without data sharing — just dropped again. And it's happening fast.

Keep an eye on this. Mainstream tech media hasn't picked it up yet.