Foo - Screwing with spam bots

Bikeforums.net is a forum about nothing but bikes. Our community can help you find information about hard-to-find and localized information like bicycle tours, specialties like where in your area to have your recumbent bike serviced, or what are the best bicycle tires and seats for the activities you use your bike for.




View Full Version : Screwing with spam bots


cuda2k
02-06-08, 03:24 PM
Anyone who has created any sort of website that isn't 100% static html in the last decade or longer has probably encountered some sort of malicious activity from an automated spam bot computer crawling across the web looking for websites to fill with all sorts of crap. Years ago I added a simple guestbook page to my website that allowed visitors to write a message which was then displayed in a list. Within weeks I was constantly adding to the list of 'filtered' words to keep the spam out.

With my current website, a user is required to create a user account, sign-in and all that before being allowed to post comments, and additional checks are in place to keep the vast majority of the data safe from such pests. Earlier this week I (finally) added some custom error handling code that sent me an email any time a visitor encountered an error on the website. Suddenly, within minutes of deploying these changes, my inbox was starting to fill up with mail. I was worried at first that users were actually hitting errors all over the site, till I looked closer at the request addresses. Spam bots. 99.5% of them. Attempts at submitting page requests with URLs embedded into the query strings and the sort. Of course my code was blowing up on that sort of thing when it's not looking for a http string. I wasn't so worried that these spammers were getting the error page, but that my inbox was getting full, and fast.

So I've spent the last few evenings adding some changes to the request handling and error handling of the website. Since I have confirmed that no valid request string on the website will contain 'http://' within it, I am now redirecting those to the definition of 'BLARG' at UrbanDictionary.com. No longer even allowing them to get to the error page. For less obviously spam related bad requests I'm redirecting to google and a few other redirection destinations. Haven't got a new error email all day long, other than 2 that I created myself just to verify that valid errors were still being caught.


jsharr
02-06-08, 03:28 PM
i like pie.

StupidlyBrave
02-06-08, 03:35 PM
Even if it is straight HTML... Check your web logs... CodeRed-like attacks and misconfigured IIS cgi directories are still being requested. At least they were the last time I lit up my Apache server.

Did you consider a captcha (http://en.wikipedia.org/wiki/Captcha)?


PATH
02-06-08, 03:43 PM
Doesn't everybody love SPAM? So what if Bots make it!:D



http://i75.photobucket.com/albums/i283/PATH_photos/spam-1.jpg

hos13
02-06-08, 03:43 PM
We process around 2 million emails in a 24 hr period, less then 1 percent is legit email. Dan Bernstein has written some interesting material on SPAM as to the solution, thus far I think he maybe correct. However it will never happen, it would be similar to the PSTN and email exchanges would bill for other exchanges for routing mail through them.

mlts22
02-06-08, 03:49 PM
On my websites, I use a script called wpoison, which gives web bots that ignore the robots.txt file oodles of seemingly valid, but absolutely bogus E-mail addresses. Some spam bots will happily run down random links the wpoison CGI URL gives for hours and hours, slurping up hundreds of thousands of bogus E-mail addresses.

This makes the spammer E-mail databases get full of nonworking, useless addresses which they can't really filter out. Combine this with some form of tarpit Apache module (which slows down HTTP requests exponentially after they hit a certain threshold), and this can occupy a spam harvester bot for a long while.

Tude
02-06-08, 03:51 PM
On my websites, I use a script called wpoison, which gives web bots that ignore the robots.txt file oodles of seemingly valid, but absolutely bogus E-mail addresses. Some spam bots will happily run down random links the wpoison CGI URL gives for hours and hours, slurping up hundreds of thousands of bogus E-mail addresses.

This makes the spammer E-mail databases get full of nonworking, useless addresses which they can't really filter out. Combine this with some form of tarpit Apache module (which slows down HTTP requests exponentially after they hit a certain threshold), and this can occupy a spam harvester bot for a long while.

oooo I likes that one!

Air
02-06-08, 04:52 PM
Me too!!

cuda2k
02-06-08, 06:20 PM
I had a coworker who worked for Match.com before joining the team. He had all sorts of stories about how he'd screw with scammers. One included dealing with programs designed to automatically download the content from Match.com to setup look-alike scam sites. When identified, they instead fed a single profile hundreds of thousands to millions of times. And let's just say this profile wouldn't bring in a lot of interested men.

blue_neon
02-06-08, 06:26 PM
Nerds.

Hickeydog
02-06-08, 06:54 PM
Nerds.

incorrect. The correct term would be geek. Specifically, an Internet geek. A nerd is one who attempts to know everything possible. A geek is one who specializes in a specific field, and is a self- trained expert in that field.

Jerseysbest
02-06-08, 07:27 PM
incorrect. The correct term would be geek. Specifically, an Internet geek. A nerd is one who attempts to know everything possible. A geek is one who specializes in a specific field, and is a self- trained expert in that field.

http://www.execupundit.com/uploaded_images/star-trek-inspirational-poster-724610.jpg

A Nerd would know the difference between a Nerd and a Geek.

blue_neon
02-06-08, 07:35 PM
Lol

East Hill
02-06-08, 08:20 PM
On my websites, I use a script called wpoison, which gives web bots that ignore the robots.txt file oodles of seemingly valid, but absolutely bogus E-mail addresses. Some spam bots will happily run down random links the wpoison CGI URL gives for hours and hours, slurping up hundreds of thousands of bogus E-mail addresses.

This makes the spammer E-mail databases get full of nonworking, useless addresses which they can't really filter out. Combine this with some form of tarpit Apache module (which slows down HTTP requests exponentially after they hit a certain threshold), and this can occupy a spam harvester bot for a long while.

Oooh, sneaky, clever, and devilish!

East Hill

RedHairedScot
02-06-08, 10:19 PM
I've always liked the teergrube: a host configured to be

v-e-r-y s-l-o-w

It works much better with a mailserver, but it works for anything that waits for and parses your reply. Wastes their time and keeps them from trashing other sites. (Well, slows down one thread, but you can only do what you can.)

Of course, the real solution is to extend the death penalty to spammers.

ManBearPig
02-06-08, 10:40 PM
i like pie.

I agree, I like pie too.

(You lost me at "website.")

mlts22
02-07-08, 01:34 AM
I've always liked the teergrube: a host configured to be

v-e-r-y s-l-o-w

It works much better with a mailserver, but it works for anything that waits for and parses your reply. Wastes their time and keeps them from trashing other sites. (Well, slows down one thread, but you can only do what you can.)

Of course, the real solution is to extend the death penalty to spammers.

I like the teergrube idea, but have been too lazy to implement. Instead, I used SpamCannibal which did a great job of reducing one domain's spam from 20,000 messages a day to a couple hundred. Eventually, I just got tired of the bandwidth drag, and moved the domain to a hosting ISP.

One pleasant surprise of Exchange 2007 -- it does tarpitting automatically.

iamlucky13
02-07-08, 11:01 PM
My site went 3 years after I added user commenting before spam really got to be a big problem.

Last month I finally buckled down and wrote a script to filter the spam. I'm stopping probably 99% of it now just by scoring posts based on their usage of certain words, and for the time being, storing the rest for analysis and to reaffirm how awesome I am at stopping mindless robots. In 4 weeks I've collected 614 of the little twerps.

As a tip, about half of them give no http_referer. That's a really easy way right there to filter out a big chunk of the human versus otherwise.

"I receive a ton of spam every day. Much of it offers to help me get out of debt or get rich quick." ~Bill Gates