Annoying bot traffic

posted by Jeff | Thursday, June 5, 2025, 5:00 PM | comments: 0

I spend a little more for proper redundancy on my sites, because I believe that's important for the folks that choose to use their distraction time with me. Also, obviously, that's my kind of nerd stuff. Sure, the ad revenue doesn't cover it, but I can't not provide a high level of service. (Also, it's worth mentioning that back in the day I could pay my mortgage on 30k daily ad impressions.) I pride myself on how fast it all is, and the up-time, especially compared to the days when it all ran on a single rented server. The forum app powering the PointBuzz forums normally runs on two instances using only 1.75 GB, and a working set of 500 MB.

That all works fine, because typical traffic to that app is around 2,000 requests per hour, which is nothing. However, I've been dealing with a lot of bots originating from Alibaba servers in Hong Kong, Singapore and sometimes China or India. Sometimes it's from a single machine in Google or Amazon's cloud in the US (not the search engine). They go nuts and generate 100,000 requests per hour. This is also not what I would consider "high," about 28 requests per second, but it does push the limits of what that tiny amount of memory can handle because it's so bursty. By that, I mean the requests are not uniformly distributed over time. It can slow things down, and sometimes generate errors for people.

I can scale it up to 3.5 GB of memory, and everything is again fine. In fact, there's enough overhead at that point probably to do hundreds of requests per second. I don't actually know what the upper limit is there. But it's also the difference between spending $25 a month and $50 a month. I'm already spending $72 on the two "premium" instances running all of CoasterBuzz and the non-forum part of PointBuzz (as well as this blog an a number of other things), $110 for all of the databases and small amounts for Redis, ElasticSearch, Functions, etc. The database has yet to be overwhelmed, fortunately, so I haven't had to scale that up.

The bots are annoying, but if they get really ugly, it's easy enough to see where they're coming from, and block them. Alibaba is especially easy, because they come from predictable ranges of IP's, and always in East Asia. The one-offs are the more annoying ones, because any idiot can spin up a bunch of ephemeral machines and run a script to scrape the sites. Between the two sites, there are hundreds of thousands of pages, so there's a lot to hit. It's great for long-tail Google juice, but not great for rogue crawlers.


Comments

No comments yet.


Post your comment: