A Hacker News thread that is currently exploding — 855 points and almost 450 comments — is seemingly about something quite innocent: Anna's Archive has published an llms.txt file on its blog. The file is addressed directly to LLMs that crawl the web, encouraging them (and the people behind them) to donate to the archive. A bit meta, a bit funny.

But if you dig one layer deeper, you quickly realize that this is not a quirky PR stunt. It's almost a provocation.

Anna's Archive explicitly states that AI models have likely already been trained on their data — and now they want to be paid for it.

The background is brutal: The archive, which provides access to over 140 million digitized books and articles, has, according to lawsuits and internal documents, been a central training data source for some of the largest AI players in the world. Meta is alleged to have downloaded a full 81.7 terabytes of data from Anna's Archive and similar services. NVIDIA is being sued for attempting to secure direct access. DeepSeek has openly acknowledged that they trained on 800,000 Chinese scientific books from there.

And the price tag for "legal" access? $100,000 in crypto — something at least 30 companies are said to have paid.

Just four days before the blog post appeared, a federal judgment in the US for $19.5 million was rendered against the archive. The publishers who sued explicitly defined Anna's Archive as an AI training data hub, not just a piracy site. This is a legal move that could have consequences far beyond this single case.

What makes this interesting right now? Because the llms.txt file acts as a public confession wrapped in humor. The archive implicitly says: you've already used our data, you know it, we know it — so pay up. And the HN thread is frantically discussing what this means for the norms around web scraping, fair use, and what future training datasets will actually look like as the legal system tightens its grip.

This is still an early signal from community sources, and we don't know how the ongoing lawsuits against Meta and NVIDIA will end. But the direction is clear: the legally gray area the AI industry has operated in regarding training data is about to become considerably narrower.

Keep an eye on the HN thread — it's moving fast.