4/20/2025 at 7:34:24 AM
In the last week I've had to deal with two large-scale influxes of traffic on one particular web server in our organization.The first involved requests from 300,000 unique IPs in a span of a few hours. I analyzed them and found that ~250,000 were from Brazil. I'm used to using ASNs to block network ranges sending this kind of traffic, but in this case they were spread thinly over 6,000+ ASNs! I ended up blocking all of Brazil (sorry).
A few days later this same web server was on fire again. I performed the same analysis on IPs and found a similar number of unique addresses, but spread across Turkey, Russia, Argentina, Algeria and many more countries. What is going on?! Eventually I think I found a pattern to identify the requests, in that they were using ancient Chrome user agents. Chrome 40, 50, 60 and up to 90, all released 5 to 15 years ago. Then, just before I could implement a block based on these user agents, the traffic stopped.
In both cases the traffic from datacenter networks was limited because I already rate limit a few dozen of the larger ones.
Sysadmin life...
by aorth
4/20/2025 at 8:47:01 AM
Try Anubis: <https://anubis.techaro.lol>It's a reverse proxy that presents a PoC challenge to every new visitor. It shifts the initial cost of accessing your server's resources back at the client. Assuming your uplink can handle 300k clients requesting a single 70kb web page, it should solve most of your problems.
For science, can you estimate your peak QPS?
by rollcat
4/20/2025 at 1:39:30 PM
Anubis is a good choice because it whitelists legitimate and well behaved crawlers based on IP + user-agent. Cloudflare works as well in that regard but then you're MITM:ing all your visitors.by marginalia_nu
4/20/2025 at 5:23:45 PM
Also, I was just watching brodie robertson video about how United Nations has this random search page of unesco which actually has anubis.Crazy how I remember the HN post where anubis's blog post was first made. Though, I always thought it was a bit funny with anime and it was made by frustration of (I think AWS? AI scrapers who won't follow general rules and it was constantly giving requests to his git server and it actually made his git server down I guess??) I didn't expect it to blow up to ... UN.
by Imustaskforhelp
4/20/2025 at 5:33:19 PM
Her*It was frustration at AWS' Alexa team and their abuse of the commons. Amusingly if they had replied to my email before I wrote my shitpost of an implementation this all could have turned out vastly differently.
by xena
4/21/2025 at 6:52:47 AM
Oh I am so so sorry I didn't see your gender and assumed it to be a (he). { really sorry about that once again}Also didn't expect you to respond to my comment xD
I went through the slow realization of while reading this comment that you are the creator of anubis and I had such a smile when I realized that you commented to me.
Also, this project is really nice, but I actually want to ask, I haven't read the docs of anubis but could it be that the proof of work isn't wasted / it can be used for something (I know I might get downvoted because I am going to mention cryptocurrency, but nano currency has a proof of work required for each transaction, so if anubis actually does the proof of work as by nano standards, then theoretically that proof of work could atleast be some useful)
Looking forward to your comment!
by Imustaskforhelp
4/22/2025 at 3:12:49 PM
Useful as a for-profit cryptocurrency? I think zero chance.The only way I see anything like that incorporated is a folding@home kind of thing that could help humanity as a whole.
Of course, if someone makes it work like you suggested, and it catches on, I will personally haunt your dreams forever. Don't give them any ideas.
by akaij
4/21/2025 at 2:55:50 AM
This looks very cool, but isn't it just a matter of months until all scrapers get updated and can easily beat this challenge and are able to compute modern JS stuff?by martin82
4/20/2025 at 11:32:58 PM
My company's site has also been getting hammered by Brazilian IPs. They're focused on a single filterable table of fewer than 100 rows, querying it with various filter combinations every second of every minute of every day.by nodogoto
4/20/2025 at 9:13:14 AM
I've seen a few attacks where the operators placed malicious code on high-traffic sites (e.g. some government thing, larger newspapers), and then just let browsers load your site as an img. Did you see images, css, js being loaded from these IPs? If they were expecting images, they wouldn't parse the HTML and not load other resources.It's a pretty effective attack because you get large numbers of individual browsers to contribute. Hosters don't care, so unless the site owners are technical enough, they can stay online quite a bit.
If they work with Referrer Policy, they should be able to mask themselves fairly well - the ones I saw back then did not.
by luckylion
4/20/2025 at 11:10:31 PM
I seem to remember a thing china did 10 years back where they injected JavaScript into every web request that went through their Great Firewall to target GitHub… I think it’s known as the “Great Cannon” because they can basically make every Chinese internet user’s browser hit your website in a DoS attack.Digging it up: https://www.washingtonpost.com/news/the-switch/wp/2015/04/10...
by ninkendo
4/21/2025 at 3:22:25 PM
Wow, that had passed me by completely, thanks for sharing!Very similar indeed. The attacks I witnessed where easy to block once you identified the patterns (referrer was visible and they used predictable ?_=... query parameters to try and bypass caches), but very effective otherwise.
I suppose in the event of a hot war, the Internet will be cut quickly to defend against things like the "Great Cannon".
by luckylion