Chinese Open-Source Model Beats Claude, GPT-5.5, and Gemini in Coding

Moonshot AI's Kimi K2.6 has just been released as an open-weights model — and the HN thread is exploding after it topped a programming benchmark against the biggest proprietary players.

◉

24AI Underground

May 4, 2026·2 min read

Chinese Open-Source Model Beats Claude, GPT-5.5, and Gemini in Coding

Community buzz · early signal

SIGNALS

Kimi K2.6 from Chinese Moonshot AI is an open-weights model with 1 trillion parameters total, but only activates 32 billion per token
It has beaten Claude, GPT-5.5, and Gemini in a practical coding challenge — and the HN thread is hot right now
The model is free to download, potentially making frontier-level coding assistance available to everyone

Early signal · community sourced · unverified

A Hacker News thread that is currently exploding — 374 points and 218 comments in a short time — is about something quite sensational: an open-weights model from China has just surpassed Claude, GPT-5.5, and Gemini in a practical programming challenge.

The model is called Kimi K2.6, was created by Moonshot AI, and was released on April 20 this year. Its architecture is a sparse Mixture-of-Experts with a total of one trillion parameters — but because only 32 billion are activated per token, the inference cost is comparable to a much smaller model. It's a smart way to get a brutal amount of capacity at a reasonable price.

What really makes people gasp here isn't just the performance — it's that the model is open-weights. You can download it. Yourself. And run it yourself if you have enough hardware, or use it via API for about 80 cents per million input tokens. In comparison, Claude Opus and GPT-5.5 are closed systems behind Anthropic and OpenAI.

Open-weights frontier coding is no longer just a dream — it's a 594 GB download.

On SWE-Bench Pro, which measures the ability to solve real GitHub issues, K2.6 scores 58.6% — that's above both Claude Opus 4.6 and GPT-5.4 in one of the evaluations. On Humanity's Last Exam with tool access, it lands at 54.0%, again ahead of Claude (53.0%) and GPT-5.4 (52.1%). It is ranked number one among all 77 open-weights models on the Artificial Analysis Intelligence Index.

Something else noted in the discussion: the hallucination rate is significantly down from its predecessor K2.5. From 65% down to 39% — still not perfect, but now close to Claude Opus level.

For developers working with agentic workflows, another detail worth noting: K2.6 supports so-called agent swarms with up to 300 parallel sub-agents that can run for over 12 hours straight. It's not just a benchmark trick — it's designed for actual long-term autonomous coding.

How much of this is Moonshot AI hype and how much is real? The discussion on HN is, as usual, healthily skeptical, and it's worth noting that benchmarks vary depending on which evaluation you look at. But the signal is clear enough: open-weights models are creeping into and past proprietary frontier AI performance, and it's happening faster than most people thought.

This is an early signal based on community sources from HN and independent technical assessments — not editorially verified by 24AI.

Chinese Open-Source Model Beats Claude, GPT-5.5, and Gemini in Coding

Related Articles

Claude Code Refuses to Work if You Mention OpenClaw in Commits

Who Owns the Code Claude Code Wrote? No One Knows the Answer

Devin jumps into your terminal — and wants to take over your workflow