An article that surfaced on the Lobsters AI scene — 0xkato.xyz — has attracted an unusual amount of attention over the past few days. The title is almost provocatively simple: How LLMs Actually Work. But the comment section is the reason we're taking note of this now.
Because the energy in the comments isn't just "ooh, interesting intro to transformers." What people are actually discussing is what isn't transformers — and why that might matter.
The quantity and quality of discussion around RNN variants — namely Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU) — is noticeably higher than you'd expect from an "introductory article." It seems many practitioners are tired of reading about GPT architecture for the twelfth time, and would rather talk about what's actually running in embedded systems, on edge hardware, and in real-time applications where transformers are too slow and too heavy.
This isn't an academic debate. The embedded AI market — estimated at nearly 20 billion dollars — runs almost entirely on RNN-based architectures like GRU and LSTM, not on the large transformer models we hear about in the mainstream. GRUs are especially popular because they are faster to train and easier to tune than LSTMs, and they perform just as well on short to medium-length sequences.

What's interesting about this wave of engagement is the timing. In parallel, we're seeing State-Space Models (SSMs) like Mamba beginning to attract more serious attention as a third alternative — neither traditional RNN nor full transformer. The conversation on Lobsters suggests that a number of developers are in the process of reconsidering architecture choices they took for granted two years ago.
Of course, this is an early signal from community sources, not a peer-reviewed study. Lobsters is a relatively niche network for technically oriented developers, and comment sections are not representative of the industry as a whole. But exactly this type of conversation has previously been a precursor to shifts in what people actually build.
Worth watching to see whether this energy around non-transformer architectures starts appearing on r/LocalLLaMA and Hacker News over the coming weeks.
