Conventional wisdom has long held that pseudonymity – posting online without a full name – offers reasonably good protection for ordinary users. Not because it's impossible to crack, but because it costs too much time and resources. That protection is now on the verge of disappearing.

New research from ETH Zurich, UC Berkeley, Anthropic, Google, and the Machine Learning Alignment Theory Scholars program documents that large language models (LLMs) can deanonymize online users at scale – faster, cheaper, and more precisely than any previous method.

Identified 226 out of 338 Users for Under 20,000 Kroner

In a key experiment, researchers linked pseudonymous Hacker News users to real LinkedIn profiles from a pool of 89,000 candidates. The result: 226 out of 338 users were correctly identified, which corresponds to 67% recall at 90% precision, according to the research material. Classical methods, by comparison, achieved close to zero percent accuracy at a similar precision level.

Total cost for the entire experiment: under 2,000 US dollars – around 20,000 Norwegian kroner. Per person, this amounts to between 14 and 56 kroner, depending on scale and methodology.

67%
Recall at 90% precision (Hacker News)
45%
Recall at 99% precision (Reddit)

For Reddit users who posted in film forums, up to 45% recall at 99% precision was achieved. In one attempt, the comment history of individual users was split into two with a one-year interval – and two-thirds were still correctly matched. With traditional methods, comparable figures were below one percent.

AI Reveals Who You Are From Your Text Posts – For 12 Kroner

ESRC: The Four-Step Machine That Reads You

Behind the results lies a methodology called ESRC – Extraction, Search, Reasoning, and Calibration. The system works exclusively with unstructured text and requires no manual effort from an investigator.

The system fundamentally differs from older deanonymization attacks – such as the well-known Netflix Prize attack from 2008 – which required structured datasets. ESRC operates directly on raw, unprocessed forum text.

AI Reveals Who You Are From Your Text Posts – For 12 Kroner

'Practical Obscurity' Is No Longer Enough Protection

The researchers point out that a central privacy principle is now under pressure: practical obscurity – the idea that even if deanonymization is technically possible, it is so resource-intensive that it is rarely performed in practice.

Ask yourself: could a team of clever investigators figure out who you are from your posts? If so, LLM agents can probably do the same – and the cost is only decreasing

It is co-researcher Simon Lermen at ETH Zurich who formulates it this way, according to the research material. Lead researcher Daniel Paleka says he was surprised by 'how little information is needed to link two accounts'.

The models can also infer personal attributes – place of residence, income level, age, and occupation – with up to 85% accuracy from Reddit posts alone, according to the same research material.

Norwegian Implications: GDPR and Pseudonymization Under Pressure

For Norwegian businesses and public agencies, this is far from an abstract academic discussion.

Under GDPR, pseudonymization is considered a recognized technical measure to reduce risk when processing personal data. The Norwegian Data Protection Authority and European supervisory authorities have, in practice, accepted well-implemented pseudonymization as an element in risk assessments under Articles 25 and 32 of the General Data Protection Regulation.

When a commercial actor can break pseudonymity for under 50 kroner per person using openly available AI APIs, the technical basis for such assessments is significantly weakened.

The GDPR article on pseudonymization was not written for a world where a language model can re-identify people for the price of a cup of coffee.

This particularly affects:

  • Public sector: Norwegian municipalities, health trusts, and NAV (the Norwegian Labour and Welfare Administration) are increasingly conducting data-driven analysis based on pseudonymized datasets. If pseudonymization no longer provides sufficient protection against re-identification, it may require a complete revision of data processing agreements and Data Protection Impact Assessments (DPIA).
  • Business: Companies that use customer data, user reviews, or employee surveys under the assumption of anonymity may face real legal exposure if the data is actually re-identifiable.
  • Research and journalism: Anonymized interviews and source protection are under pressure. In the experiment against the partially edited Anthropic Interviewer dataset – interviews with named researchers – 9 out of 33 anonymized individuals were correctly identified with 82% precision.

What Does This Mean Going Forward?

The researchers estimate that around 27% recall is achievable at internet scale – i.e., against datasets with millions of candidates – a level that cannot be matched by non-LLM-based methods. Against one million candidates, 35% recall at 90% precision is projected.

It is worth emphasizing that the research currently describes what is technically possible under controlled conditions. The methods have not been validated in all possible real-world attack scenarios, and there are legitimate questions about their transferability to all types of pseudonymized datasets. Nevertheless, the direction is clear enough that supervisory authorities, legal professionals, and system owners should address the implications now.

The research material is published in affiliation with ETH Zurich, UC Berkeley, Anthropic, and Google, and has been covered by Ars Technica (March 2026).