Anonymous Korean Claims to Have Proven Attention Was Never an O(n²) Problem

An unverified mathematical proof from a Korean forum is spreading like wildfire on r/MachineLearning. The claim: The Transformer's biggest bottleneck is an illusion created by softmax.

A rapidly moving thread on r/MachineLearning right now isn't about GPT-5 or Gemini Ultra — it's about a PDF attachment from an unnamed user on a Korean AI forum. A user from "The Singularity Gallery" community felt the proof was too important to be buried in a local thread, and translated and shared it globally. The result: 197 points and a comment section where people are actually working through the equations instead of posturing.

The claim itself is controversial in the best way. For nine years, we've lived with self-attention being an O(n²d) problem — quadratic in sequence length n. This is why long context windows are so costly, and it's why entire industries of research have focused on circumventing this. Flash Attention, sparse attention, linear attention — all are essentially workarounds for n².

The anonymous proof, called "The d² Pullback Theorem", argues that the n² bottleneck is self-imposed. Softmax normalization, the very heart of classical attention, forces the attention matrix to full rank n and destroys what the author calls a "Euclidean Matching structure." In other words: we have paid an astronomical computational price for a mathematical property we ourselves introduced.

The proposed solution is "Centered Shifted-Quadratic (CSQ) Attention" — softmax is replaced by a degree-2 polynomial kernel (x²). According to the proof, this yields O(nd³) complexity, meaning that for sufficiently large sequences, this is potentially a dramatic improvement.

If this holds true, for nine years we've been paying the n² price for a problem that was actually d²-dimensional.

Now, it's important to keep a cool head here. This is an early signal from community sources, not a peer-reviewed paper. No one has yet formally confirmed the proof, and there are good reasons why softmax is actually where it is — including training stability and interpretability. The comment section on Reddit is divided: some believe the math looks solid, others point to possible holes in the argumentation regarding what CSQ-attention actually preserves of attention semantics.

But it is precisely the excitement here that makes this worth following. If a single anonymous post from a Korean forum starts a serious debate about the fundamental complexity of the Transformer architecture, it's a sign that community-driven research is beginning to match institutional research in impact. Keep an eye out for any bigger names starting to comment — that will say a lot about whether this deserves a full replication study.

Anonymous Korean Claims to Have Proven Attention Was Never an O(n²) Problem

Related Articles

Claude Code Dug Up a 23-Year-Old Linux Vulnerability

Free AI Hidden in Your Mac — Nobody Knows About It

AMD Fights Back: Lemonade Makes Local LLM on AMD Chips Actually Usable