alt.hn

1/17/2025 at 4:49:47 PM

Amazon's AI crawler is making my Git server unstable

https://xeiaso.net/notes/2025/amazon-crawler/

by bitbasher

1/17/2025 at 5:19:26 PM

How much of a problem is it?

by JSTrading

1/17/2025 at 5:27:04 PM

3Ti of egress and climbing, I'm in the hole financially and it's making my personal infra that relies on it unstable.

by xena

1/17/2025 at 6:40:12 PM

Damn that's rude. At least you appear to be using Vultr, imagine if it was running on one of those newfangled cloud providers which mark bandwidth up by a few orders of magnitude...

by jsheard

1/17/2025 at 6:43:43 PM

It's actually slightly worse. That vultr node is a reverse proxy over wireguard to my homelab.

by xena

1/17/2025 at 10:26:41 PM

Remove the gittea instance for now until it's sorted out? Respond to all git.* traffic with a 420 until it's sorted out.

by bitbasher

1/17/2025 at 8:00:09 PM

Have you verified that they are actually Amazon crawlers as outlined here:

https://developer.amazon.com/amazonbot

by alphan0n

1/17/2025 at 8:23:30 PM

Yes.

by xena

1/17/2025 at 8:29:29 PM

Can you share the list of offending ip’s, here or your website so we can use them in block lists?

Also, there is an email to contact:

amazonbot@amazon.com

by alphan0n

1/17/2025 at 9:12:29 PM

The website that you were pummeling is in the article: git.xeserv.us. I sent an email earlier today and have gotten no response.

Right now your crawler bots are getting the bee movie script, so you may want to delete all the data that's being scraped from that domain. Unless you like jazz that is.

It'd be a gesture of good faith to remunerate me for the egress fees your bot incurred, but I'm not gonna die on that hill.

by xena

1/17/2025 at 10:21:47 PM

Apologies, I’m not affiliated with Amazon in any way.

I meant the Amazon ip addresses that are causing you trouble so I can preemptively block them.

by alphan0n