A thread on Lobsters AI is buzzing right now around Baidu's fresh open-source release: Unlimited-OCR. And while Baidu is hardly an unfamiliar name, this is something different from what they typically deliver.
At the heart of the matter is a concrete technical problem that anyone who has worked with document parsing knows all too well: the longer the document, the more existing OCR models struggle. The KV cache grows, speed drops, and beyond 50+ pages accuracy starts to crumble. Traditional solutions handle this by chopping the document up page by page — but that means losing context between pages, turning the whole thing into an engineering band-aid rather than a proper solution.
Unlimited-OCR does something fundamentally different. It introduces Reference Sliding Window Attention (R-SWA), an attention mechanism that keeps the KV cache constant throughout the entire decoding process — regardless of how long the output becomes. This means the model can process 40, 100, or even more pages in a single forward pass under the 32K token limit, without speed degrading along the way.
The numbers circulating are quite impressive: 93.92% on OmniDocBench v1.6, around 7,800 tokens per second at 6,000 output tokens, and a 100-page PDF completed in 8–12 seconds. By comparison, traditional pipelines take 45–90 seconds and require post-processing on top of that.

The model is built on a Mixture-of-Experts architecture with 3 billion total parameters, but only 500 million activated during inference. That makes it relatively easy to run locally — something community members have already started testing. One important practical detail recurring in the comments: GGUF quantizations currently require a specific llama.cpp build (PR #17400) until DeepSeek-OCR support lands in the main branch.
Why is this interesting beyond benchmarks? Because this is open source, and because the R-SWA mechanism is presented as general-purpose — Baidu suggests it could be applied to ASR and translation as well. If that holds up, it's an architectural move that could quickly find its way into other projects.
These are, of course, early signals from the community, and we haven't yet seen independent replications at any significant scale. But the engagement on Lobsters suggests people are actually testing, not just reading.
