Exclude-by-Keyword: Thoughts on Spam and Robots.txt

Note: This solution is for spam that cannot be filtered. There are already wonderful tools to help with comment / forum / wikispam such as LinkSleeve and Akismet. However, this proposed method would prevent the more nefarious methods such as HTML Injection, XSS, and Parasitic Hosting techniques.

Truth be told, I rarely use the Robots.txt file. It’s functionalities can be largely replicated on a page-by-page basis via the robots META tag and, frankly, we spend a lot more time on getting page into the SERPs than excluding them.

However, after running / creating several large communities with tons of user-generated content, I realized that the Robots.txt file could offer a lot more powerful tools for exclusion. Essentially, exclude-by-keyword.

The truth is, there is no reason for the word “cheap cialis” to appear on my informational site about pets. If the keyword occurs anywhere on my site, it is because I was spammed.

So why not create a simple Robots.txt exclusion that is based on keywords?

User-Agent: *

Disallow-by-key: cialis

Disallow-by-key: viagra

Disallow-by-key: xxx

I understand that there are shortcomings -maybe one time there will be a reason to include the phrase “hardcore threesome” on my site, but I am willing to risk losing that 1 page’s potential rankings in return for the piece of mind of not getting spammed like crazy and risking the reputation of my site.

Just thinking out loud.

No tags for this post.