Claude mixes up who says what — and it's a real problem

A developer has documented a bizarre bug in Claude Code: the model attributes its own internal reasoning to the user. HN is buzzing.

◉

24AI Underground

April 10, 2026·2 min read

Claude mixes up who says what — and it's a real problem

Community buzz · early signal

SIGNALS

Claude Code has a documented bug where the model misattributes its own internal messages to the user
This is not a typical hallucination — it appears to be a "harness bug" that mislabels internal reasoning as user input
The Hacker News thread has exploded with 343 comments and 449 points in a short time

Early signal · community sourced · unverified

Gareth Dwyer recently published an article on dwyer.co.za that is currently tearing up the entire AI underground on Hacker News. The title says it all: Claude mixes up who actually said what — and Dwyer believes it's not okay.

What makes this particularly interesting is that this isn't the usual "model making things up" type of hallucination we're all used to talking about. Here, it appears that Claude Code, Anthropic's coding assistant, sends messages to itself as part of internal processing — and then erroneously attributes these messages to the user. In other words: the model thinks you said something you never said, because it mixes its own thought process with your input.

When an AI doesn't know the difference between its own thoughts and what you actually wrote, we have a fundamental trust problem.

The comment section on HN is full of developers nodding in recognition — or who are shocked. Several describe similar experiences with Claude Code where the model suddenly refers to instructions or context that were never explicitly provided by the user. What was previously dismissed as strange isolated cases is now beginning to look like a systematic pattern.

Why is this important? Well, because attribution errors of this type are far more insidious than common hallucinations. When a model invents a fact, you can usually check it. But when the model erroneously attributes an action or a statement to you — and uses it as the basis for further reasoning — the entire conversation logic can unravel without you necessarily noticing it.

Research data we have looked at supports that this is a broader industry problem: GPT-4o fabricated or paraphrased quotes in over half of the test cases in certain benchmarks, while Gemini 1.5 Pro performed much better. Ironically, Claude has previously been praised for refusing to generate false quotes from public figures — which makes this harness bug even more surprising.

This is one of those early signals moments where community discussion is far ahead of official statements. Anthropic has not yet publicly commented on the matter. Whether this is an isolated implementation error in the Claude Code harness or something that runs deeper into the model's architecture, we do not know yet.

Worth following closely. And perhaps double-check what "instructions" Claude thinks it has received from you next time you use it.

Source: Hacker News AI Best + dwyer.co.za — community-based early signals, not verified by Anthropic.

Claude mixes up who says what — and it's a real problem

Related Articles

No one knows how to measure AI's impact — and it's reaching a boiling point

FeatDrop lets users drop feature requests directly in your app

Anthropic's Secret AI Finds Bugs Older Than the Internet