Bike Forums - View Single Post

01-28-25 | 08:07 AM

#89

smontanaro

Senior Member

Joined: Aug 2007

Posts: 5,958

Likes: 2,139

From: Evanston, IL

Bikes: many

Quote:

Originally Posted by Duragrouch

I'd be curious to know if the BF servers can sense a web crawler and disengage with or block...

I imagine that would turn into a game of whack-a-mole. Any web crawler large enough to effectively execute a denial-of-service attack on the BF server(s) would likely be coming from multiple places at once (or nearly at once).

The BF admins should be able to identify the culprits and threaten their ISPs with total blockage. Whatever it is, it certainly seems not to be well-behaved. OTOH, the BF robots.txt file is pretty skimpy, disallowing the ChatGPT bot, a few specific pages, some Google thing I don't recognize as a typical search engine crawler, and imposing a one-second delay only on Bing's crawler. If clients are ignoring the specific disallow entries (which mostly look like they would cause database activity), they should definitely be blocked strenuously. If more than Bing might hit the server too hard, I'd add a larger crawl delay and apply it to all crawlers, something like

Quote:

User-agent: *
Crawl-delay: 100

Another step might be to beef up the hardware. I don't know what Internet Brands uses, but in this day and age, it seems their servers should be hosted in the cloud with near real-time expansion of capacity as needed.

Last edited by smontanaro; 01-29-25 at 05:52 AM.

Reply

0