Imagine you want to leave a comment on a blog. Next, imagine you have to solve one of those captchas — a jumbled image of letters and numbers — to prove you’re human. Now imagine that you can’t see. You have to solve an audio captcha instead, such as the ones below. All you have to do is work out what the sequence of numbers is in each one:



MP3 | OGG



MP3 | OGG



MP3 | OGG

Rather tricky, huh? And these are genuine examples, not edited or downsampled.

Like most people, I needed a way to block spam on my WordPress blog but I refuse to use captchas. Spam is not my readers’ problem and as you can see, captchas are an extra hassle for everyone. Hard for people with good eyesight and even harder for people with poor or no eyesight (which includes all of us at some point in the future). Clearly we need to find a better alternative…

Question-and-answer captcha

These are of the form “what is two plus seven?” or “in which season do leaves fall?”. With text like this a blind person can hear it with software called a screen reader, which is good, but it has to be in a language they understand, which is not good. My blog is in English and Japanese so this a problem. There could be multiple answers, too; leaves can fall in the autumn or in the fall.

An extra form field hidden with CSS

This involves a field that should stay empty, possibly with a message saying “leave empty”. The idea is that spam bots will automatically fill in every field they come across but humans won’t. CSS can be used to hide the field from humans but it’s still “visible” to screen reader users or those with user styles applied (or none at all). Also, I tried this out for a while but in practice it wasn’t very effective — the spam bots are too smart.

Third-party spam-blocking service

Akismet is the most widely-used but there are others. They work by sending each comment to a remote server where it is analysed and classified as spam or genuine. This method seems to be effective but you have to register and possibly pay depending on your spam-blocking needs. It’s also not ideal in an environment where comments should be private.

IP address blacklists

Whether using your own list or somebody else’s, it’s easy to block comments from IP addresses that are known to send spam. I don’t like this for two reasons, though:

Several users can share a single IP address, for example through a VPN, or spammers could be sending comments from a machine they’ve secretly gained access to. I don’t want to penalise innocent users.

Looking through some of my spam comments I couldn’t see any from the same IP address so I doubt this approach would be very effective.

The answer?

Then I came across some spam bot research and a nice way to block them by Ned Batchelder (and probably others too). As most spam comments are sent rapidly from remote servers, we can add a unique token to our comment form and then check it’s still there when a comment is posted. I tried this and sure enough it blocked about two-thirds of spam — still room for improvement but it’s a good starting point. After much tweaking and experimenting I found that further analysis of a comment could catch most of the remaining spam, such as the delay between page load and posting, the ratio of text to links and the similarity of comment text to blog post text. Finally I’m satisfied. Out of the last 100 spam comments on this blog, 98 of them were detected and blocked by the filter. It’s not quite perfect and it may become less effective over time but I’m happy to make that compromise and keep my readers happy.

I’ve put this together into a spam-blocking plugin for WordPress and you can see the source code here. It’s designed to be invisible for users and maintenance-free for site owners. If you try it and like it, please give it a nice review in the WordPress directory so I can get that warm fuzzy feeling!

UPDATE

Since writing this I’ve seen spam increase again despite my plugin. I was going to try to update it but I’ve since discovered (thanks Otto) another WordPress plugin, Cookies for Comments, which is much better and works using a similar concept. This is my recommendation for all WordPress users.