Whitehouse.gov may be trying to hide the Snowden pardon petition from search engine results. Why might Google be indexing the content anyway?

I’ve been following the Edward Snowden – NSA saga the past week or so with fascination, as I suspect some of you are as well. Last night over dinner, my wife and I were pondering what might be the final outcome of this, depending what happens between Russia (or now Ecuador) and the US in the coming days. I wondered – might there be any chance of an eventual pardon for Snowden from the White House on Obama’s last day in office? There must be some discussion of whether a pardon could be in the works or not, right? So I consulted the Oracle of Google, searching pardon Edward Snowden.

Below a few news feed results, the number one organic result is a subdomain from Whitehouse.gov. This ‘petitions’ subdomain facilitates citizens to create, manage, and promote petitions to our government. If a petition receives more than 100,000 supporters then the administration has made a commitment to address the petition with a response on the matter in question.

What is immediately curious to any of us with a trained eye in search marketing is that the result from Petitions.whitehouse.gov is ranking highly despite the page being marked disallowed by the subdomain’s robots.txt file.

My first thought leaned on the side of ‘conspiracy theory’: Does Whitehouse.gov want to stifle the discoverability of the petition to pardon Snowden, so they took proactive steps to get the page out of the index? I’m not one that generally subscribes to paranoia, but given what Snowden revealed with PRISM, maybe we would be right to be more skeptical.

Was the page indexed by Google and then Whitehouse.gov later updated its robots.txt file in order to block it? After some quick digging, I uncovered a few things worth noting:

The entire directory for Disallow: /petition is blocked so this issue is systematic to pages in the directory, not the individual page.

There is not a no-index meta tag at the page level

The robots.txt file was last cached on 6/13/13 and as of that date the /petition/ directory was blocked, so at least there was not a (very) recent update to the robots file to block this page or directory.

While Google will not crawl a robots blocked page, it will at times determine a page has merits because of other reference factors outside the page (read: link quality and volume, anchor text, social factors) and thereby show the page in its results.

Initial Conclusion: No after-the-fact steps appear to be taken by webmasters at Whitehouse.gov to block the Snowden petition page in particular. The actual content of the page is not showing in the indexed results, but Google did choose to include the page URL in its result set because other signals that indicate that it’s a relevant page (to Google it’s the number one standard search result out of ~571,000).

The Bigger Question: Why is Whitehouse.gov choosing to block search engines from indexing content of their petition pages, when these pages are created by the people and for the people to express and promote concerns to their government leaders? I cannot think of a good rationale for this. Can you? Is anyone out there at Sunlight.org listening?

I’ve created a petition page on Petitions.whitehouse.gov to petition the Obama administration to remove the robots.txt disallow from petitions on their site. This action will promote the transparency and conduit for democracy in action that the web platform was created to serve in the first place.

Find the petition located here and pass this URL to your networks.

People may have trouble finding my new petition via search engines, so that will make it harder to achieve the 100,000 signatures to garner its due attention. Oh, the delicious irony…