LLM runs without OS in 1356 bytes x86-assembly — and it works

Someone has squeezed Llama2 inference into under 1500 bytes of x86-assembly and boots it directly from disk — without an operating system. The Lobsters AI community is buzzing.

◉

24AI Underground

May 5, 2026·Updated May 6, 2026·2 min read

LLM runs without OS in 1356 bytes x86-assembly — and it works

Community buzz · early signal

SIGNALS

SectorLLM is a Llama2 inference engine in 1356 bytes of x86 real mode assembly — boots directly from disk, zero OS
Runs a 260K-parameter model, not exactly GPT-4 — but that's not the point
This is extreme code golf that illuminates something important about how little you actually need

Early signal · community sourced · unverified

A thread on Lobsters AI that's currently buzzing is about the sectorllm project — and the concept is so absurdly compact that it stops you mid-scroll.

Someone has written a functional Llama2 inference engine in x86 real mode assembly, compressed it to 1356 bytes, and made it boot directly from a disk sector. No Linux, no Windows, no runtime. You turn on the machine and the model starts generating text.

Now, it's important to be honest about what this actually is: the project runs stories260K, a toy model with 260,000 parameters, hardcoded architecture and prompt, and greedy argmax sampling. The context window stops at 512 tokens. This isn't something you'll replace Claude with on Friday. As the project itself admits: performance and precision are not optimal — it's a deliberate trade-off for size.

The point isn't what it can do. The point is that it exists at all.

So why do people care? Because this is one of those rare projects that forces you to fundamentally rethink what inference truly requires. The entire community discussion revolves around precisely that: what is the absolute floor? Can one go lower? What happens if you try a slightly larger model — the author himself mentions that stories15M would probably require a transition to protected mode, which breaks the entire concept.

It's also a technical curiosity that the project operates in x86 real mode — a mode most modern OSes never touch, where you only have access to 1 MB of addressable memory. That it's even possible to run transformer inference there, even on a tiny model, is not trivial.

For people working with edge AI, embedded systems, or just curious about the lower echelons of what modern hardware can do without the abstraction layers we're used to, this is genuinely interesting engineering work. It's also a reminder that the AI field still has room for people who think in bytes, not just in billions of parameters.

Be aware that this is an early signal from a niche community — no one has independently verified all technical claims yet, and the project is openly available on GitHub for those who want to dig in themselves.

Worth following if you're in the edge/embedded world.

LLM runs without OS in 1356 bytes x86-assembly — and it works

Related Articles

Chinese Open-Source Model Beats Claude, GPT-5.5, and Gemini in Coding

Claude Code Refuses to Work if You Mention OpenClaw in Commits

Who Owns the Code Claude Code Wrote? No One Knows the Answer