If you’ve never read my blog before, welcome. I’m the head of the webspam team at Google. And I have a blog for days just like this.

Okay, first off you should go read this post. It’s entitled “Me Against Google” and the author is unhappy that talkorigins.org was nowhere to be found in Google for the last 5-6 days. After that post, go read this Slashdot post, entitled “Google De-indexes Talk.Origins, Won’t Say Why.” By the time you’re done, your pulse should be pounding. Hell, you should be angry. Damn that evil Google for not communicating with webmasters!! Or as Wesley put it in his blog:

You might think that a company that prides itself upon advanced textual analysis and automated decision-making algorithms might provide helpful warning messages to webmasters concerning problems found in their sites. You would be wrong.

Okay, ready for my side of the story? Here’s the timeline of how things happened:

– talkorigins.org was hacked on November 18th. I know this because Wesley says so in his blog post.

– By November 27th, Google had detected spammy links and text on talkorigins.org. In case you’re wondering, here’s what the cracker added:



<script>document.write(String.fromCharCode(60,100,105,118,32,115,116,121,108,101,61,39,100,

105,115,112,108,97,121,58,110,111,110,101,39,62))</script><br><a href="http://vvu.edu.gh/images/?i=animal-porn">animal porn</a>, <a href="http://vvu.edu.gh/images/?i=animal-sex">animal sex</a>, <a href="http://vvu.edu.gh/images/?i=beastiality">beastiality</a>, <a href="http://vvu.edu.gh/images/?i=rape-sex">rape sex</a>, <a href="http://vvu.edu.gh/images/?i=sleeping-sex">sleeping sex</a>, <a href="http://deepx.com/images/?i=animal-porn">animal porn</a>, <a href="http://deepx.com/images/?i=beastiality">beastiality</a>, <a href="http://deepx.com/images/?i=dog-porn">dog porn</a>, <a href="http://deepx.com/images/?i=horse-porn">horse porn</a>, <a href="http://deepx.com/images/?i=rape-sex">rape sex</a>, <a href="http://deepx.com/images/?i=sleeping-sex">sleeping sex</a>, <a href="http://theoi.com/image/?i=animal-porn">animal porn</a>, <a href="http://theoi.com/image/?i=animal-sex">animal sex</a>, <a href="http://theoi.com/image/?i=beastiality">beastiality</a>, <a href="http://ugobe.com/media/?i=dvd-covers">dvd covers</a>, <a href="http://ugobe.com/media/?i=dvd-ripper">dvd ripper</a>, <a href="http://ugobe.com/media/?i=psp-downloads">psp downloads</a>, <a href="http://ugobe.com/media/?i=psp-games">psp games</a>, <a href="http://ugobe.com/media/?i=psp-movies">psp movies</a>



Not pretty stuff–lots of text about rape and animal porn. In case you’re wondering, that JavaScript at the beginning produces the string “<div style=’display:none’>”, which makes the entire section of spammy junk hidden. So talkorigins.org has these porn words and spammy links, and it’s all hidden via sneaky JavaScript.

We have pretty good reason to believe that this site was hacked, but it’s still causing problems for regular users, so Google has to take action. Here’s what we do:

– By November 27th, the site was classified as hacked and spammy. We stopped showing it for user queries.

– By November 27th, we started flagging this site as penalized in Google’s webmaster console. I believe that Google is the only search engine that will confirm to webmasters that their site does have penalties. No, we don’t confirm penalties if we think it might clue in web spammers that they’ve been caught. But yes, we do try to confirm penalties if we think a site is legitimate or has been hacked. You can read more about how we confirm penalties in this previous post.

I hear a few people ask, “It’s nice that I can sign up for Google’s webmaster console and learn that Google penalized my site. But couldn’t Google have done more?” Well, it turns out that we did do more:

– By November 28th, we emailed multiple addresses at talkorigins.org to let them know exactly what happened. According to the records I’m looking at, we tried to email contact at talkorigins.org, info at talkorigins.org, support at talkorigins.org, and webmaster at talkorigins.org with a timestamp of 2006-11-28 14:24:15. Here’s an excerpt from the email that we sent:

Dear site owner or webmaster of talkorigins.org, While we were indexing your webpages, we detected that some of your

pages were using techniques that were outside our quality guidelines,

which can be found here: http://www.google.com/webmasters/guidelines.html

In order to preserve the quality of our search engine, we have

temporarily removed some webpages from our search results. Currently

pages from talkorigins.org are scheduled to be removed for at least 60 days. Specifically, we detected the following practices on your webpages: * The following hidden text on talkorigins.org: e.g.

animal porn, animal sex, beastiality, rape sex, sleeping sex, animal porn, beastiality, dog porn, horse porn, rape sex, sleeping sex, animal porn, animal sex, beastiality, dvd covers, dvd ripper, psp downloads, psp games, psp movies

… We would prefer to have your pages in Google’s index. If you wish to be

reincluded, please correct or remove all pages that are outside our

quality guidelines. When you are ready, please visit: https://www.google.com/webmasters/sitemaps/reinclusion?hl=en to learn more and request a reinclusion request.

…

You can read more about how we try to email webmasters about issues on their site in this previous post. According to his post, Wesley did a reinclusion request recently, and I’ve confirmed that the reinclusion request was approved, so I expect talkorigins.org to be back in Google within 24-48 hours.

But let’s take a step back. This site was hacked and stuffed with a bunch of hidden spammy porn words and links. Google detected the spam in less than 10 days; that’s faster than the site owner noticed it. We temporarily removed the site from our index so that users wouldn’t get the spammy porn back in response to queries. We made it possible for the webmaster to verify that their site was penalized. Then we emailed the site, with the exact page and the exact text that was causing problems. We provided a link to the correct place for the site owner to request reinclusion. We also made the penalty for a relatively short time (60 days), so that if the webmaster fixed the issue but didn’t contact Google, they would still be fine after a few weeks.

Ultimately, each site owner is responsible for making sure that their site isn’t spammy. If you pick a bad search engine optimizer (SEO) and they make a ton of spammy doorway pages on your domain, Google still needs to take action. Hacked sites are no different: lots of spammy/hacked sites will try to install malware on users’ computers. If your site is hacked and turns spammy, Google may need to remove your site, but we will also try to alert you via our webmaster console and even by emailing you to let you know what happened. To the best of my knowledge, no other search engine confirms any penalties to sites, nor do they email site owners.

Wesley and anyone else who works on talkorigins.org, I’m sorry that this was a stressful experience for you. Could Google do a better job? Absolutely, and we’ll keep working on it. For example, maybe we can show a more specific message for hacked sites in the webmaster console. Google could also try to identify better email addresses when writing to site owners. For example, for talkorigins.org, there are email addresses such as “archive@” and “submissions@” that we could have used instead that might have reached the right person. I’m open to other suggestions too. But please give Google a little bit of credit, because I do think we’re doing more to alert webmasters to issues than any other search engine.

Note to new readers of my blog: I pre-moderate my comments, and it’s after 2 a.m. and I’m going to bed now. If your comment doesn’t show up immediately, it’s waiting for me to approve it after I wake up. 😉