A thread on Lobsters AI that's currently buzzing is about the sectorllm project — and the concept is so absurdly compact that it stops you mid-scroll.

Someone has written a functional Llama2 inference engine in x86 real mode assembly, compressed it to 1356 bytes, and made it boot directly from a disk sector. No Linux, no Windows, no runtime. You turn on the machine and the model starts generating text.

Now, it's important to be honest about what this actually is: the project runs stories260K, a toy model with 260,000 parameters, hardcoded architecture and prompt, and greedy argmax sampling. The context window stops at 512 tokens. This isn't something you'll replace Claude with on Friday. As the project itself admits: performance and precision are not optimal — it's a deliberate trade-off for size.

The point isn't what it can do. The point is that it exists at all.

So why do people care? Because this is one of those rare projects that forces you to fundamentally rethink what inference truly requires. The entire community discussion revolves around precisely that: what is the absolute floor? Can one go lower? What happens if you try a slightly larger model — the author himself mentions that stories15M would probably require a transition to protected mode, which breaks the entire concept.

It's also a technical curiosity that the project operates in x86 real mode — a mode most modern OSes never touch, where you only have access to 1 MB of addressable memory. That it's even possible to run transformer inference there, even on a tiny model, is not trivial.

For people working with edge AI, embedded systems, or just curious about the lower echelons of what modern hardware can do without the abstraction layers we're used to, this is genuinely interesting engineering work. It's also a reminder that the AI field still has room for people who think in bytes, not just in billions of parameters.

Be aware that this is an early signal from a niche community — no one has independently verified all technical claims yet, and the project is openly available on GitHub for those who want to dig in themselves.

Worth following if you're in the edge/embedded world.