diff --git a/content/blog/lrs-antispam.md b/content/blog/lrs-antispam.md index affb1c5..c4d71da 100644 --- a/content/blog/lrs-antispam.md +++ b/content/blog/lrs-antispam.md @@ -10,7 +10,7 @@ tags = ["cryptography"] katex = true +++ -Some people with a lot of money (or, at least, who control a lot of machines) decided to flood the Internet with useless requests, crawling every website respecting neither robots.txt nor anything else. They even made some services shut down, effectively performing a DDoS. All that to train AI with garbage data so you can ask a chatbot to quote Wikipedia or StackOverflow instead of using a proper search engine. +Some people with a lot of money (or, at least, who control a lot of machines) decided to flood the Internet with useless requests, crawling every website respecting neither robots.txt nor anything else. They even made some services shut down, [effectively performing a DDoS](https://drewdevault.com/2025/03/17/2025-03-17-Stop-externalizing-your-costs-on-me.html). All that to train AI with garbage data so you can ask a chatbot to quote Wikipedia or StackOverflow instead of using a proper search engine. The Internet relies on a heap of protocols that only work well when people behave correctly: it stops being efficient when someone gains too much power (bandwidth and IP addresses). Cloud providers indeed provide bad guys with enough clouds to make a storm, not to talk about the "Internet of Things" that allows botnets to run on security cameras, baby monitors and sextoys. One of the most common practices on the Internet is fundamentally altruistic: giving a copy of a file to whomever is asking for it, for free (what it commonly called "Web"). The problem is that answering such a request consumes a machine's resources (energy, computing time, IO time, memory, etc.), resources that can be exhausted if people are asking too much. @@ -18,15 +18,15 @@ The Internet relies on a heap of protocols that only work well when people behav ### Rate-limiting IP addresses -The most basic solution of counting traffic per IP address is probably the first thing to do. Sadly it is not enough anymore, as it cannot detect botnets or even large IP regions owned by a single entity. Lowering the threshold also leads to false positives, blocking an entire school or office when dozens of people are sharing the same IP address. +The most basic solution of counting traffic per IP address is probably the first thing to do. Sadly it is not enough anymore, as it cannot detect [botnets](https://www.trendmicro.com/vinfo/us/security/news/vulnerabilities-and-exploits/a-closer-exploration-of-residential-proxies-and-captcha-breaking-services) or even large IP regions owned by a single entity. Lowering the threshold also leads to false positives, blocking an entire school or office when dozens of people are sharing the same IP address. ### Proof of intelligence -Captchas ask the user to solve a problem that is (supposedly) difficult for a computer, but (supposedly) easy for a human. However they take time to solve even for a human, they are not accessible to people who can't see or hear or have mental disabilities, and modern AIs can already solve them. +Captchas ask the user to solve a problem that is (supposedly) difficult for a computer, but (supposedly) easy for a human. However they take time to solve even for a human, they are not accessible to people who can't see or hear or have mental disabilities, and modern AIs can already solve them. The irony is that their main purpose is not to help us filtering bots, but to help Google training AI: first it was character recognition for scanning old books, now it is image categorization for self-driving cars and voice transcription for voice assistants and targetted advertising. ### Proof of browser -Systems that do not require user input can check whether they are being run in a proper web browser, by testing various features. However they can be fooled by giving more power to the bot's engine, which then becomes indistinguishable from a browser. +Systems that do not require user input can check whether they are being run in a proper web browser, by testing various features. However they can be fooled by giving more power to the bot's engine (e.g. using [Selenium](https://www.selenium.dev/)), which then becomes indistinguishable from a browser. ### Proof of work @@ -36,6 +36,26 @@ Proof of work imposes to solve problems that are long to solve, but fast to chec If you are big enough to have a global database of real time traffic per IP address (e.g. CloudFlare, Amazon, Google, etc.), you can detect spammy addresses and stop them immediately. However such a centralized solution is not acceptable, as it gives too much power to gigantic corporations and create single points of failure (see the large scale CloudFlare and AWS outages in 2025). Decentralized and anonymous spam databases may be an interesting research subject but it seems quite complicated and insufficient against sudden attacks using disposable IP addresses. +### Political solutions + +Why is spamming even possible? + +* Being rich or being funded by banks allows people to acquire a huge amount of physical resources (computers, network links), even against the will of the community. +* The business model of art makes profitable to produce AI-generated garbage thanks to advertising and big platforms which do not fulfill the role of a proper editor, to the detriment of artists. +* Poor people are encouraged to join botnets of residential proxies to earn a monthly few dollars by proxying requests. + +Here are some solutions: + +* Investments that have considerable consequences on other people should be discussed (and potentially vetoed) by all the people involved, including beneficiaries, workers, suppliers, neighbours, and in that case, Internet users. +* Everyone should have enough revenue to live decently, unconditionally. Then, nobody will be forced to sell bandwidth or workforce for AI training to survive. +* Artists should be funded for their work, not for what they sell. Then, producing low-quality content cannot be more profitable than making art you love. +* Dividing trusts into smaller, decentralized entities. Big Internet service providers, hosters and social media platforms have too much power over economy and culture. The same goes with entertainment companies, who produce blockbusters and reduces art's diversity. +* Fund free software as common goods or public services, so end users are not forced to see AI and ads appear after updating their system. + +All this is part of what I call socialism. (and to clarify, there is no dictatorship involved) + +Sadly, we also need short-term solutions, so let's introduce a more technical one. + ## Toward a decentralized and privacy-sound solution We will explore ways to provide global human-wise rate-limiting without central entity and respecting privacy. @@ -55,3 +75,26 @@ Example cost: https://eprint.iacr.org/2024/553.pdf 29kB per signature with rings ### PrivacyPass https://datatracker.ietf.org/doc/html/rfc9576 +https://www.rfc-editor.org/rfc/rfc9577.html + +* Joint Attester, Issuer, Origin + * Attestation request must be LRS. + * Token can be anything. +* Joint Attester, Issuer + * Attestation request must contain a tmp pk, signed by LRS. + * Token must be a certificate of the tmp pk. +* Joint Issuer, Origin + * Attestation request must be LRS. + * Attestation request must contain a tmp pk, signed by LRS. + * Token can be anything. + +Retained architecture: joint issuer and origin. + +* Client generates tmp kp +* Client sends attestation request with tmp pk, signed by LRS +* Attester responds with timestamped tmp pk certificate (attester pk + tmp pk + time + sig = 136 bytes) + * no need to be PQ here, as the certificate is short-lived +* Client makes request to the server +* Server (issuer) asks for certificate +* Client sends certificate +* Server (issuer) responds with token