website/content/blog/lrs-antispam.md
2026-01-01 23:16:08 +01:00

5.1 KiB

+++ title = "Can antibots be both efficient and anonymous?" date = 2025-09-20 description = "todo" insert_anchor_links = "left" draft = true [taxonomies] tags = ["cryptography"] [extra] katex = true +++

Some people with a lot of money (or, at least, who control a lot of machines) decided to flood the Internet with useless requests, crawling every website respecting neither robots.txt nor anything else. They even made some services shut down, effectively performing a DDoS. All that to train AI with garbage data so you can ask a chatbot to quote Wikipedia or StackOverflow instead of using a proper search engine.

The Internet relies on a heap of protocols that only work well when people behave correctly: it stops being efficient when someone gains too much power (bandwidth and IP addresses). Cloud providers indeed provide bad guys with enough clouds to make a storm, not to talk about the "Internet of Things" that allows botnets to run on security cameras, baby monitors and sextoys. One of the most common practices on the Internet is fundamentally altruistic: giving a copy of a file to whomever is asking for it, for free (what it commonly called "Web"). The problem is that answering such a request consumes a machine's resources (energy, computing time, IO time, memory, etc.), resources that can be exhausted if people are asking too much.

A few solutions

Rate-limiting IP addresses

The most basic solution of counting traffic per IP address is probably the first thing to do. Sadly it is not enough anymore, as it cannot detect botnets or even large IP regions owned by a single entity. Lowering the threshold also leads to false positives, blocking an entire school or office when dozens of people are sharing the same IP address.

Proof of intelligence

Captchas ask the user to solve a problem that is (supposedly) difficult for a computer, but (supposedly) easy for a human. However they take time to solve even for a human, they are not accessible to people who can't see or hear or have mental disabilities, and modern AIs can already solve them.

Proof of browser

Systems that do not require user input can check whether they are being run in a proper web browser, by testing various features. However they can be fooled by giving more power to the bot's engine, which then becomes indistinguishable from a browser.

Proof of work

Proof of work imposes to solve problems that are long to solve, but fast to check. A few seconds of computing time is needed to solve the challenge. The difficulty must be well balanced so it is fast enough for a legitimate user, but too expensive for a spammer who sends thousands of requests per second. However this appears not to frighten spammers anymore, as Anubis (an antispam system based on proof of work) failed to stop some attacks. It is likely that the gap between low-end computers and AI servers or botnets will only get larger, making PoW a non-viable solution.

Global monitoring

If you are big enough to have a global database of real time traffic per IP address (e.g. CloudFlare, Amazon, Google, etc.), you can detect spammy addresses and stop them immediately. However such a centralized solution is not acceptable, as it gives too much power to gigantic corporations and create single points of failure (see the large scale CloudFlare and AWS outages in 2025). Decentralized and anonymous spam databases may be an interesting research subject but it seems quite complicated and insufficient against sudden attacks using disposable IP addresses.

Toward a decentralized and privacy-sound solution

We will explore ways to provide global human-wise rate-limiting without central entity and respecting privacy.

Linkable ring signatures

[TODO explain LRS]

Suppose a public identity set, linking public keys to verified unique persons. Each week, each identity can declare a limited number of temporary public keys (TPK) in a signed bundle. At the end of the week, the TPK superset is frozen into read-only. When making a request, a client linked to an identity will choose one of its unused TPKs plus some number of other people's TPKs at random (avoiding picking two TPKs from the same identity). These TPKs will form a ring against which the request is signed. The request is sent with the ring and the signature attached. The server verifies that the ring is included in the TPK superset then verifies the signature. It stores the signature's linking tag and increments its request count. It responds with a token that authenticates the client and enables to count its requests without making new signatures.

The client can reuse its token (or sign again with the same ring) as long as it remains identifiable by other means (typically, when using the same IP and user-agent).

To benefit from caching and amortization of verification, which is possible with some logarithmic schemes, rings may be defined by a public pseudorandom function, so there is a non-negligible chance that many users share the same ring.

Example cost: https://eprint.iacr.org/2024/553.pdf 29kB per signature with rings of size 1024, verification takes 128ms then 0.3ms after amortization.

PrivacyPass

https://datatracker.ietf.org/doc/html/rfc9576