Gareth Dwyer recently published an article on dwyer.co.za that is currently tearing up the entire AI underground on Hacker News. The title says it all: Claude mixes up who actually said what — and Dwyer believes it's not okay.
What makes this particularly interesting is that this isn't the usual "model making things up" type of hallucination we're all used to talking about. Here, it appears that Claude Code, Anthropic's coding assistant, sends messages to itself as part of internal processing — and then erroneously attributes these messages to the user. In other words: the model thinks you said something you never said, because it mixes its own thought process with your input.
The comment section on HN is full of developers nodding in recognition — or who are shocked. Several describe similar experiences with Claude Code where the model suddenly refers to instructions or context that were never explicitly provided by the user. What was previously dismissed as strange isolated cases is now beginning to look like a systematic pattern.
Why is this important? Well, because attribution errors of this type are far more insidious than common hallucinations. When a model invents a fact, you can usually check it. But when the model erroneously attributes an action or a statement to you — and uses it as the basis for further reasoning — the entire conversation logic can unravel without you necessarily noticing it.
Research data we have looked at supports that this is a broader industry problem: GPT-4o fabricated or paraphrased quotes in over half of the test cases in certain benchmarks, while Gemini 1.5 Pro performed much better. Ironically, Claude has previously been praised for refusing to generate false quotes from public figures — which makes this harness bug even more surprising.
This is one of those early signals moments where community discussion is far ahead of official statements. Anthropic has not yet publicly commented on the matter. Whether this is an isolated implementation error in the Claude Code harness or something that runs deeper into the model's architecture, we do not know yet.
Worth following closely. And perhaps double-check what "instructions" Claude thinks it has received from you next time you use it.
Source: Hacker News AI Best + dwyer.co.za — community-based early signals, not verified by Anthropic.
