Google has been improving the way that its Googlebot searches dynamic web pages for some time - but it seems to be causing some added interest just at the moment.

How Google scans sites for information to index isn't ever made clear to anyone for fear that the knowledge would be useful in black SEO. However, there are many who think that white SEO is not only a valid process but one that puts good content where it should be. Think of it as leveling the search bot playing field.

A big problem is how search bots deal with dynamic content - and JavaScript/Ajax is a tough one to solve. For example, you could have a page that never used a URL to move to another page but simply refreshed its contents either every so often or when the user clicked a button to see more. Such dynamic sites are very attractive from the developer's point of view and from the user's - but from the search engine's point of view the site consists of one default page and nothing ever changes.

In the past Google has encouraged developers to avoid using JavaScript to deliver content or links to content because of the difficulty of indexing dynamic content. Over time, however, the Googlebot has incorporated ways of searching content that is provided via JavaScript.

Now it seems that it has got so good at the task Google is asking us to allow the Googlebot to scan the JavaScript used by our sites:

Of course we don't have details of exactly what the Googlebot does with the JavaScript it finds, but it is now clear that it does do something. A recent blog post, responsible for much of the media attention to JavaScript indexing, documents a sighting in the server logs which shows the Googlebot accessing JavaScript files.

Actually the observation goes a little further than this in that it shows the Googlebot creating a dynamic URL which can only be accessed by running some code in response to clicking a button. Yes, the Googlebot now appears to not only emulate a human clicking the button, but it also runs the JavaScript that is activated by the click and then follows the link so constructed.

This is a small step for a Googlebot but a huge leap in the indexing of the web. Previously its efforts in working with JavaScript seem to have been limited to obtaining comments that were added via a JavaScript Ajax call. Previously Google attempted to get developers to standardize the way dynamic content was handled so that it could crawl it but the proposal has been more or less ignored.

If you think about the problem for a moment then you will realize how difficult the web crawling problem has become with the rise of Ajax. When a webbot encounters a button, say, then this no longer means a form submission but could be activating a content update via JavaScript/Ajax or via a Rest call using a generated URL. The only sensible thing to do is for the webbot to actually load and run the page in the same way that a modern browser does and to interact with it in much the same way a modern user would. This is difficult.

Notice that the JavaScript has to be executed on the Google servers running the Googlebot. This has led to speculation of whether or not it might be possible to include JavaScript on a site that could use the Google cloud to compute something. For example, imagine that you set up a JavaScript program to compute the n-digits of Pi, or a BitCoin miner, and had the result formed into a custom URL - which the Googlebot would then try to access as part of its crawl. By looking at, say, the query part of the URL you might be able to get back a useful result.

Other people have reported strange things happening to shopping carts. One rumor is that large orders are being placed consisting of one of each item on sale, only to be aborted at the last minute.

If you know of any verifiable incidents of bad Googlebot JavaScript interaction then let us know.

More Information

Google Bot now crawls arbitrary Javascript sites

Making AJAX Applications Crawlable

Related Articles

Googlenomics - Experimental SEO

Google Needs a New Search Algorithm

Search Engines

To be informed about new articles on I Programmer, subscribe to the RSS feed, follow us on Google+, Twitter, Linkedin or Facebook, install the I Programmer Toolbar or sign up for our weekly newsletter.







Comments



Make a Comment or View Existing Comments Using Disqus





or email your comment to: comments@i-programmer.info